--- base_model: unsloth/llama-3.3-70b-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en datasets: - codelion/Sky-T1_data_17k - codelion/optillm-router-dataset metrics: - accuracy --- # Llama-3.3-70B-o1 Thinker Model This model was fine-tuned on CoT-style reasoning traces. The model will respond with a _thinking_ trace between the `<|begin_of_thought|>` and `<|end_of_thought|>` tags. The final answer will be between the `<|begin_of_solution|>` and `<|end_of_solution|>` tags. Compared to the base Llama model, this thinker model has a tendency to generate a large number of tokens. So, if you are benchmarking make sure you have the full generated text in the reponse, ending with the `<|end_of_solution|>` tag. For most queries, you will need to set the `max_tokens` to at least 8192. The GGUF quants for the model are available here - [Llama-3.3-70B-o1-gguf](https://huggingface.co./codelion/Llama-3.3-70B-o1-gguf) The model was trained using QLoRA fine-tuning. You can find the adapter here - [Llama-3.3-70B-o1-lora](https://huggingface.co./codelion/Llama-3.3-70B-o1-lora). ## Evaluation results | Model | AIME 2024 pass@1 | |-------|------------------| | **Llama-3.3-70B-o1** | **46.7** | | Llama-3.3-70B | 30.0 | | Sky-T1-32B-Preview | 43.3 | | o1-preview | 40.0 | | QwQ | 50.0 | - **Developed by:** codelion - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3.3-70b-instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)