bespokelabs
/

Bespoke-Stratos-7B

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ryanmarten commited on 12 days ago

Commit

54e8451

·

verified ·

1 Parent(s): 962647b

Update README.md

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -24,15 +24,15 @@ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://hugging
 The dataset is derived by distilling DeepSeek-R1 using the data pipeline of Berkeley NovaSky’s Sky-T1 with some modifications. More info in the dataset card at [Bespoke-Stratos-17k](https://huggingface.co/datasets/Bespoke-Stratos-17k).
 It outperforms Qwen-2.5-7B-Instruct on math reasoning benchmarks:
-||Bespoke-Stratos-7B|Qwen2.5-7B-Instruct|DeepSeek-R1-Distill-Qwen-7B|
-|---|---|---|---|
-|AIME2024|20.0|10.0|55.5|
-|MATH500|82.0|74.2|83.3|
-|GPQA-Diamond|37.8|33.3|49.1|
-|LiveCodeBench v2 Easy|71.4|65.9|81.3|
-|LiveCodeBench v2 Medium|25.5|18.9|42.2|
-|LiveCodeBench v2 Hard|1.6|3.3|2.4|
-|LiveCodeBench v2 All|36.1|31.9|46.6|
 Note that the authors of Sky-T1 had [noted](https://github.com/NovaSky-AI/SkyThought/issues/4#issuecomment-2585860004) that they saw little or no improvement in training 7B or 14B models with their data.

 The dataset is derived by distilling DeepSeek-R1 using the data pipeline of Berkeley NovaSky’s Sky-T1 with some modifications. More info in the dataset card at [Bespoke-Stratos-17k](https://huggingface.co/datasets/Bespoke-Stratos-17k).
 It outperforms Qwen-2.5-7B-Instruct on math reasoning benchmarks:
+||Bespoke-Stratos-7B|Qwen2.5-7B-Instruct|DeepSeek-R1-Distill-Qwen-7B (Ours)|DeepSeek-R1-Distill-Qwen-7B (Reported)|
+|---|---|---|---|---|
+|AIME2024|20.0|10.0|43.3|55.5|
+|MATH500|82.0|74.2|89.4|83.3|
+|GPQA-Diamond|37.8|33.3|44.9|49.1|
+|LiveCodeBench v2 Easy|71.4|65.9|81.3|-|
+|LiveCodeBench v2 Medium|25.5|18.9|42.2|-|
+|LiveCodeBench v2 Hard|1.6|3.3|2.4|-|
+|LiveCodeBench v2 All|36.1|31.9|46.6|-|
 Note that the authors of Sky-T1 had [noted](https://github.com/NovaSky-AI/SkyThought/issues/4#issuecomment-2585860004) that they saw little or no improvement in training 7B or 14B models with their data.