|
hf (dtype=bfloat16,use_cache=True,pretrained=./checkpoint-1400/,max_length=2048), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 16 |
|
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |
|
|-------------------------------------|-------|------|-----:|--------|---|-----:|---|-----:| |
|
|leaderboard_gpqa | N/A| | | | | | | | |
|
| - leaderboard_gpqa_diamond | 1|none | 0|acc_norm|↑ |0.3030|± |0.0327| |
|
| - leaderboard_gpqa_extended | 1|none | 0|acc_norm|↑ |0.3004|± |0.0196| |
|
| - leaderboard_gpqa_main | 1|none | 0|acc_norm|↑ |0.2969|± |0.0216| |
|
|leaderboard_musr | N/A| | | | | | | | |
|
| - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm|↑ |0.5400|± |0.0316| |
|
| - leaderboard_musr_object_placements| 1|none | 0|acc_norm|↑ |0.3203|± |0.0292| |
|
| - leaderboard_musr_team_allocation | 1|none | 0|acc_norm|↑ |0.4080|± |0.0311| |
|
|
|
hf (dtype=bfloat16,use_cache=True,pretrained=./checkpoint-1400/,max_length=768), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 128 |
|
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |
|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |
|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.5974|± |0.0135| |
|
| | |strict-match | 5|exact_match|↑ |0.5921|± |0.0135| |
|
|
|
|