open-thoughts
/

OpenThinker-7B

@@ -26,15 +26,23 @@ The dataset is derived by distilling DeepSeek-R1 using the [data pipeline availa
 More info about the dataset can be found on the dataset card at [OpenThoughts-114k dataset](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k).
 This model improves upon the [Bespoke-Stratos-7B model](https://huggingface.co/bespokelabs/Bespoke-Stratos-7B), which used 17k examples ([Bespoke-Stratos-17k dataset](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)).
-|  | AIME24 | MATH500 | GPQA-D | LCB Easy | LCB Med | LCB Hard | Open Weights | Open Data | Open Code |
-|--|--------|---------|--------|----------|---------|----------|--------------|-----------| --------- |
-|OpenThinker-7B|26.7|86|40.4|74.2|31.1|3.3|✔|[✔](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)|[✔](https://github.com/open-thoughts/open-thoughts) |
-|Bespoke-Stratos-7B|20.0|82.0|37.8|71.4|25.2|1.6|✔|[✔](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)|[✔](https://github.com/bespokelabsai/curator/tree/main/examples/bespoke-stratos-data-generation)|
-|DeepSeek-R1-Distill-Qwen-7B|43.3|89.4|44.9|81.3|42.2|2.4|✔|❌|❌|
-|gpt-4o-0513|9.3|74.6|49.9|-|-|-|❌|❌|❌|
-|o1-mini|63.6|90.0|60.0|-|-|-|❌|❌|❌|
 ## Intended uses & limitations

 More info about the dataset can be found on the dataset card at [OpenThoughts-114k dataset](https://huggingface.co/datasets/open-thoughts/open-thoughts-114k).
 This model improves upon the [Bespoke-Stratos-7B model](https://huggingface.co/bespokelabs/Bespoke-Stratos-7B), which used 17k examples ([Bespoke-Stratos-17k dataset](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)).
+The numbers reported in the table below are independently evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
+|                             | AIME2024 | MATH500 | GPQA-Diamond | LCB Easy v2 | LCB Medium v2 | LCB Hard v2 | LCB All v2 |
+| --------------------------- | -------- | ------- | ------------ | ----------- | ------------- | ----------- | ---------- |
+| OpenThinker-7B              | 43.3     | 83.0    | 42.4         | 75.3        | 28.6          | 6.5         | 39.9       |
+| Bespoke-Stratos-7B          | 16.6     | 79.6    | 38.9         | 71.4        | 25.2          | 0.8         | 35.8       |
+| DeepSeek-R1-Distill-Qwen-7B | 60       | 88.2    | 46.9         | 79.7        | 45.1          | 14.6        | 50.1       |
+| gpt-4o-0513                 | 10       | 75.8    | 46.5         | 87.4        | 42.7          | 8.9         | 50.5       |
+| o1-mini                     | 63       |         | 60           | 92.8        | 74.7          | 39.8        | 72.8       |
+|  | Open Weights | Open Data | Open Code |
+|--|--------------|-----------| --------- |
+|OpenThinker-7B|✔|[✔](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)|[✔](https://github.com/open-thoughts/open-thoughts) |
+|Bespoke-Stratos-7B|✔|[✔](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k)|[✔](https://github.com/bespokelabsai/curator/tree/main/examples/bespoke-stratos-data-generation)|
+|DeepSeek-R1-Distill-Qwen-7B|✔|❌|❌|
+|gpt-4o-0513|-|❌|❌|❌|
+|o1-mini|-|❌|❌|❌|
 ## Intended uses & limitations