Fizzarolli commited on
Commit
7c9d7ff
·
verified ·
1 Parent(s): 7f5ea76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -11,12 +11,12 @@ base_model: Qwen/Qwen2.5-7B
11
  A replication attempt of Tulu 3 on the Qwen 2.5 base models.
12
 
13
  ## Evals (so far)
14
- | | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B | Mistral 7B v0.3 (reported)
15
- |-------------------------|----------------------|--------------------------|---------------------------------|--------------|---------------------------
16
- |IFEval (prompt loose) |66.3% |72.8% |**74.7%** |56.4% |53.0%
17
- |BBH (3 shot, CoT) |64.4% |**67.9%** |21.7% |56.2% |47.0%<sup>NLL</sup>
18
- |MMLU Pro (0 shot, CoT) |xx.x% |xx.x% |56.3%<sup>Unknown</sup> |xx.x% |30.7%<sup>5-shot</sup>
19
- |AlpacaEval 2 (LC winrate)|xx.x% |12.4% |29.0% |31.4% |xx.x%
20
 
21
  ## Credits
22
  Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
 
11
  A replication attempt of Tulu 3 on the Qwen 2.5 base models.
12
 
13
  ## Evals (so far)
14
+ | | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
15
+ |-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
16
+ |IFEval (prompt loose) |66.3% |72.8% |**74.7%** |56.4% |53.0%
17
+ |BBH (3 shot, CoT) |64.4% |**67.9%** |21.7% |56.2% |47.0%<sup>NLL</sup>
18
+ |MMLU Pro (0 shot, CoT) |xx.x% |xx.x% |56.3%<sup>Unknown</sup> |xx.x% |30.7%<sup>5-shot</sup>
19
+ |AlpacaEval 2 (LC winrate)|xx.x% |12.4% |29.0% |31.4% |xx.x%
20
 
21
  ## Credits
22
  Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!