Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
How are Faithfulness and Factuality calculated?
#22
by
UjjwalP
- opened
I am aware that metrics like ROUGE and FactKB are used to determine faithfulness and factuality respectively. But it is unclear as to how the 'Faithfulness' and 'Factuality' columns were computed in the leaderboard.
@GWHed can you chime in please?
Hi! We classify each task into Faithfulness and Factuality tasks based on their characteristics, and calculate the Faithfulness and Factuality scores by averaging the evaluation metrics for tasks within each category. We are also planning to try normalising the score for each task before averaging. Detailed information such as how we classified the tasks and more can be found in our paper here: https://arxiv.org/abs/2404.05904