Spaces:
Running
on
CPU Upgrade
MMLU Average Score
Thanks for your detailed information on the "About" board. And I am also confused that there is only one score of MMLU on the leaderboard, but MMLU's calculation needs 57 tasks. So how to mix these tasks' score to single one? Just add them and divide 57(the number of tasks)? Or any trick of calculation? Thanks.
If you run the harness as mentioned, it will provide an average score at the end :)
@clefourrier
But when I run the harness as mentioned, It just has the results of subtasks.python main.py --model=hf-causal-experimental --model_args="pretrained=<model_path>,use_accelerate=True" --num_fewshot=5 --device=cuda --task=hendrycksTest-* --batch_size=4 --output_path=<output_path>
here is the instrument that I run :(
Don't you have an "all" value at the end of the table displayed or in the files saved?
@clefourrier sadly, no:(((
Ha, my bad, sorry, it's an internal thing we added for logging!
We just do an average :)
@clefourrier Okkkk, thanks! Hoping you have a good time!