OpenThinker-7B / README.md

Update README.md

eea5583 verified 6 days ago

4.8 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: Qwen/Qwen2.5-7B-Instruct
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: DCFT-Stratos-Verified-114k-7B-4gpus
	results: []
	datasets:
	- open-thoughts/open-thoughts-114k
	---

	<p align="center">
	<img src="https://huggingface.co./datasets/open-thoughts/open-thoughts-114k/resolve/main/open_thoughts.png" width="50%">
	</p>

	# OpenThinker-7B

	This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co./Qwen/Qwen2.5-7B-Instruct) on the
	[OpenThoughts-114k dataset](https://huggingface.co./datasets/open-thoughts/OpenThoughts-114k) dataset.

	The dataset is derived by distilling DeepSeek-R1 using the [data pipeline available on github](https://github.com/open-thoughts/open-thoughts).
	More info about the dataset can be found on the dataset card at [OpenThoughts-114k dataset](https://huggingface.co./datasets/open-thoughts/open-thoughts-114k).

	This model improves upon the [Bespoke-Stratos-7B model](https://huggingface.co./bespokelabs/Bespoke-Stratos-7B), which used 17k examples ([Bespoke-Stratos-17k dataset](https://huggingface.co./datasets/bespokelabs/Bespoke-Stratos-17k)).
	The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).

	\| \| AIME24 \| MATH500 \| GPQA-Diamond \| LCBv2 Easy \| LCBv2 Medium \| LCBv2 Hard \| LCBv2 All \|
	\| --------------------------- \| -------- \| ------- \| ------------ \| ----------- \| ------------- \| ----------- \| ---------- \|
	\| OpenThinker-7B \| 43.3 \| 83.0 \| 42.4 \| 75.3 \| 28.6 \| 6.5 \| 39.9 \|
	\| Bespoke-Stratos-7B \| 16.6 \| 79.6 \| 38.9 \| 71.4 \| 25.2 \| 0.8 \| 35.8 \|
	\| DeepSeek-R1-Distill-Qwen-7B \| 60 \| 88.2 \| 46.9 \| 79.7 \| 45.1 \| 14.6 \| 50.1 \|
	\| gpt-4o-0513 \| 10 \| 75.8 \| 46.5 \| 87.4 \| 42.7 \| 8.9 \| 50.5 \|
	\| o1-mini \| 63 \| 85.6 \| 60 \| 92.8 \| 74.7 \| 39.8 \| 72.8 \|

	We are fully open-source. Our [model weights](https://huggingface.co./open-thoughts), [datasets](https://huggingface.co./open-thoughts), [data generation code](https://github.com/open-thoughts/open-thoughts), [evaluation code](https://github.com/mlfoundations/Evalchemy), and [training code](https://github.com/hiyouga/LLaMA-Factory) are all publicly available.

	\| \| Open Weights \| Open Data \| Open Code \|
	\|--\|--------------\|-----------\| --------- \|
	\|OpenThinker-7B\|✅\|[✅](https://huggingface.co./datasets/open-thoughts/OpenThoughts-114k)\|[✅](https://github.com/open-thoughts/open-thoughts) \|
	\|Bespoke-Stratos-7B\|✅\|[✅](https://huggingface.co./datasets/bespokelabs/Bespoke-Stratos-17k)\|[✅](https://github.com/bespokelabsai/curator/tree/main/examples/bespoke-stratos-data-generation)\|
	\|DeepSeek-R1-Distill-Qwen-7B\|✅\|❌\|❌\|
	\|gpt-4o-0513\|❌\|❌\|❌\|❌\|
	\|o1-mini\|❌\|❌\|❌\|❌\|


	## Intended uses & limitations

	Apache 2.0 License


	## Training procedure

	We used four 8xH100 nodes to train the model for 20 hours.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 32
	- gradient_accumulation_steps: 3
	- total_train_batch_size: 96
	- total_eval_batch_size: 256
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Framework versions

	- Transformers 4.46.1
	- Pytorch 2.3.0
	- Datasets 3.1.0
	- Tokenizers 0.20.3

	More info can be found in our repository: [https://github.com/open-thoughts/open-thoughts](https://github.com/open-thoughts/open-thoughts).

	# Links
	- 📊 [Open Thoughts Launch Blog Post](https://www.open-thoughts.ai/blog/launch)
	- 📊 [Open Thoughts GitHub Repository](https://github.com/open-thoughts/open-thoughts)
	- 🧠 [OpenThoughts-114k dataset](https://huggingface.co./datasets/open-thoughts/OpenThoughts-114k)
	- 🤖 [OpenThinker-7B model](https://huggingface.co./open-thoughts/OpenThinker-7B) - this model.
	- 📊 [Bespoke-Stratos Blog Post](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation)
	- 🧠 [Bespoke-Stratos-17k dataset](https://huggingface.co./datasets/bespokelabs/Bespoke-Stratos-17k)
	- 🤖 [Bespoke-Stratos-32B model](https://huggingface.co./bespokelabs/Bespoke-Stratos-32B)
	- 🤖 [Bespoke-Stratos-7B model](https://huggingface.co./bespokelabs/Bespoke-Stratos-7B)