---
base_model: EpistemeAI/OpenReasoner-Llama-3.2-3B-rs1.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: llama3.2
language:
- en
---

## Model Introduction
An experimental model utilizes a unique, advanced form of supervised tuning. This training program loads the model and then loads the data from the dataset. It provides the data during inference time. Then, it trains the Large Language Model (LLM). During inference, it checks if the model reaches the desired answer or goal. If not, it continues training until the answer or solution is achieved.

Context Window: 128000

## Installation
Update latest transformers
```python
pip install -U transformers
```

System prompt suggested for math: 
```python

system_prompt="""
Please reason step by step, and put your final answer within \boxed{}
Respond in the following format:
<problem>
...
</problem>
<solution>
...
</solution>"""
```


Inference
```python
from transformers import pipeline
model_id = "EpistemeAI/OpenReasoner-Llama-3.2-3B-rs1.01"
pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
print(pipe("What is larger 9.9 or 9.11?"))
```

## Reference
Thank you so much to Hugging Face H4 and the dataset: [Math-500](https://huggingface.co./datasets/HuggingFaceH4/MATH-500)
  
We use this as evaluator. It was not directly trained, it was used as a test


# Uploaded  model

- **Developed by:** EpistemeAI
- **License:** apache-2.0
- **Finetuned from model :** EpistemeAI/OpenReasoner-Llama-3.2-3B-rs1.0

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)