migtissera
commited on
Commit
•
39eb076
1
Parent(s):
1e3b612
Update README.md
Browse files
README.md
CHANGED
@@ -47,7 +47,7 @@ Since the model is trained to use test-time-compute, the evalutations were perfo
|
|
47 |
| MMLU | 81.6% | - | 82.0% |
|
48 |
| MATH | 64.2% | 69.4% | 70.2% |
|
49 |
| MMLU-Pro | 65.6% | 65.0% | - |
|
50 |
-
| HumanEval |
|
51 |
|
52 |
The evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
|
53 |
|
|
|
47 |
| MMLU | 81.6% | - | 82.0% |
|
48 |
| MATH | 64.2% | 69.4% | 70.2% |
|
49 |
| MMLU-Pro | 65.6% | 65.0% | - |
|
50 |
+
| HumanEval | 61.0% | 88.1% | 87.2% |
|
51 |
|
52 |
The evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
|
53 |
|