Update README.md
Browse files
README.md
CHANGED
@@ -11,12 +11,12 @@ base_model: Qwen/Qwen2.5-7B
|
|
11 |
A replication attempt of Tulu 3 on the Qwen 2.5 base models.
|
12 |
|
13 |
## Evals (so far)
|
14 |
-
| | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B | Mistral 7B v0.3 (reported)
|
15 |
-
|
16 |
-
|IFEval (prompt loose) |66.3% |72.8% |**74.7%** |56.4%
|
17 |
-
|BBH (3 shot, CoT) |64.4% |**67.9%** |21.7% |56.2%
|
18 |
-
|MMLU Pro (0 shot, CoT) |xx.x% |xx.x% |56.3%<sup>Unknown</sup> |xx.x%
|
19 |
-
|AlpacaEval 2 (LC winrate)|xx.x% |12.4% |29.0% |31.4%
|
20 |
|
21 |
## Credits
|
22 |
Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
|
|
|
11 |
A replication attempt of Tulu 3 on the Qwen 2.5 base models.
|
12 |
|
13 |
## Evals (so far)
|
14 |
+
| | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
|
15 |
+
|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
|
16 |
+
|IFEval (prompt loose) |66.3% |72.8% |**74.7%** |56.4% |53.0%
|
17 |
+
|BBH (3 shot, CoT) |64.4% |**67.9%** |21.7% |56.2% |47.0%<sup>NLL</sup>
|
18 |
+
|MMLU Pro (0 shot, CoT) |xx.x% |xx.x% |56.3%<sup>Unknown</sup> |xx.x% |30.7%<sup>5-shot</sup>
|
19 |
+
|AlpacaEval 2 (LC winrate)|xx.x% |12.4% |29.0% |31.4% |xx.x%
|
20 |
|
21 |
## Credits
|
22 |
Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
|