|
--- |
|
language: |
|
- en |
|
license: mit |
|
library_name: transformers |
|
tags: |
|
- orpo |
|
- qwen2 |
|
- sft |
|
- chatml |
|
base_model: |
|
- MaziyarPanahi/calme-2.4-rys-78b |
|
datasets: |
|
- mlabonne/orpo-dpo-mix-40k |
|
pipeline_tag: text-generation |
|
inference: false |
|
model_creator: dfurman |
|
quantized_by: dfurman |
|
model-index: |
|
- name: CalmeRys-78B-Orpo-v0.1 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 81.63 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 61.92 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 37.92 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 20.02 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 36.37 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 66.8 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/CalmeRys-78B-Orpo-v0.1 |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
|
|
# dfurman/CalmeRys-78B-Orpo-v0.1 |
|
|
|
This model is a finetune of `MaziyarPanahi/calme-2.4-rys-78b` on 1.5k rows of the `mlabonne/orpo-dpo-mix-40k` dataset. It was trained as a generalist language model for a variety of text generation use cases, including support of agentic capabilities, roleplaying, reasoning, multi-turn conversations, long context coherence, and more. |
|
|
|
As of Oct 2024, this is the top ranking model on the [Open LLM Leaderboard](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard) 🏆. |
|
|
|
Thanks go out to [mlabonne](https://huggingface.co./mlabonne), [MaziyarPanahi](https://hf.xwall.us.kg.m/MaziyarPanahi), et al. for the source dataset and base model. |
|
|
|
## 🦾 Training |
|
|
|
You can find the experiment on W&B at this [link](https://wandb.ai/dryanfurman/huggingface/runs/1w50nu70?nw=nwuserdryanfurman). Here are a few visualizations: |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/NG5WGL0ljzLsNhSBRVqnD.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/Zhk5Bpr1I2NrzX98Bhtp8.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/WgnKQnYIFWkCRSW3JPVAb.png) |
|
|
|
|
|
## 💻 Usage |
|
|
|
<details> |
|
|
|
<summary>Setup</summary> |
|
|
|
```python |
|
!pip install -qU transformers accelerate bitsandbytes |
|
!huggingface-cli download dfurman/CalmeRys-78B-Orpo-v0.1 |
|
``` |
|
|
|
```python |
|
from transformers import AutoTokenizer, BitsAndBytesConfig |
|
import transformers |
|
import torch |
|
|
|
|
|
if torch.cuda.get_device_capability()[0] >= 8: |
|
!pip install -qqq flash-attn |
|
attn_implementation = "flash_attention_2" |
|
torch_dtype = torch.bfloat16 |
|
else: |
|
attn_implementation = "eager" |
|
torch_dtype = torch.float16 |
|
|
|
# # quantize if necessary |
|
# bnb_config = BitsAndBytesConfig( |
|
# load_in_4bit=True, |
|
# bnb_4bit_quant_type="nf4", |
|
# bnb_4bit_compute_dtype=torch_dtype, |
|
# bnb_4bit_use_double_quant=True, |
|
# ) |
|
|
|
model = "dfurman/CalmeRys-78B-Orpo-v0.1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
model_kwargs={ |
|
"torch_dtype": torch_dtype, |
|
# "quantization_config": bnb_config, |
|
"device_map": "auto", |
|
"attn_implementation": attn_implementation, |
|
} |
|
) |
|
``` |
|
|
|
</details> |
|
|
|
### Example 1 |
|
|
|
```python |
|
question = "Is the number 9.11 larger than 9.9?" |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant that thinks step by step."}, |
|
{"role": "user", "content": question}, |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
# print("***Prompt:\n", prompt) |
|
|
|
outputs = pipeline( |
|
prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 |
|
) |
|
print("***Generation:") |
|
print(outputs[0]["generated_text"][len(prompt) :]) |
|
``` |
|
|
|
``` |
|
***Generation: |
|
To compare these two numbers, it's important to look at their decimal places after the whole number part, which is 9 in both cases. Comparing the tenths place, 9.11 has a '1' and 9.9 has a '9'. Since '9' is greater than '1', 9.9 is larger than 9.11. |
|
``` |
|
|
|
### Example 2 |
|
|
|
```python |
|
question = """The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. |
|
They sold 93 loaves in the morning and 39 loaves in the afternoon. |
|
A grocery store then returned 6 unsold loaves back to the bakery. |
|
How many loaves of bread did the bakery have left? |
|
Respond as succinctly as possible. Format the response as a completion of this table: |
|
|step|subquestion|procedure|result| |
|
|:---|:----------|:--------|:-----:|""" |
|
|
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": question}, |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
# print("***Prompt:\n", prompt) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print("***Generation:") |
|
print(outputs[0]["generated_text"][len(prompt):]) |
|
|
|
``` |
|
|
|
``` |
|
***Generation: |
|
|1|Calculate total sold|Add morning and afternoon sales|132| |
|
|2|Subtract sold from total|200 - 132|68| |
|
|3|Adjust for returns|Add returned loaves to remaining|74| |
|
``` |
|
|
|
### Example 3 |
|
|
|
```python |
|
question = "What's a good recipe for a spicy margarita?" |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful assistant."}, |
|
{"role": "user", "content": question}, |
|
] |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
# print("***Prompt:\n", prompt) |
|
|
|
outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print("***Generation:") |
|
print(outputs[0]["generated_text"][len(prompt):]) |
|
``` |
|
|
|
``` |
|
***Generation: |
|
To make a Spicy Margarita, you'll need to incorporate a chili or pepper element into your classic margarita recipe. Here’s a simple way to do it: |
|
|
|
### Ingredients: |
|
- 2 oz tequila (blanco or reposado) |
|
- 1 oz fresh lime juice |
|
- 1/2 oz triple sec (Cointreau or Grand Marnier) |
|
- 1/2 oz agave syrup or simple syrup |
|
- 1-2 slices of jalapeño (or more depending on how spicy you like it) |
|
- Salt and/or chili powder for rimming the glass |
|
- Ice |
|
- Lime wheel for garnish |
|
|
|
### Instructions: |
|
1. **Muddle Jalapeño**: In a shaker, muddle the jalapeño slices slightly. This will release the oils and heat from the peppers. |
|
2. **Add Remaining Ingredients**: Add the tequila, lime juice, triple sec, and agave syrup or simple syrup. |
|
3. **Shake and Strain**: Fill the shaker with ice and shake vigorously until cold. Strain into a salt and/or chili powder rimmed glass filled with ice. |
|
4. **Garnish and Serve**: Garnish with a lime wheel and enjoy. |
|
|
|
If you prefer a smoother spiciness that doesn't overpower the drink, you could also consider making a jalapeño-infused tequila by leaving the jalapeño slices in the bottle of tequila for several hours to a couple of days, adjusting the time based on desired level of spiciness. Then use this infused tequila instead of regular tequila in the recipe above. |
|
|
|
Another variation is to use a spicy syrup. To make this, combine equal parts water and sugar with a few sliced jalapeños in a saucepan. Bring to a boil, stirring occasionally to dissolve the sugar. Reduce heat and simmer for about 5 minutes. Let cool, strain out the jalapeños, then store in a sealed container in the refrigerator until ready to use. Use this spicy syrup instead of regular syrup in the recipe. |
|
|
|
As always, adjust the quantity of jalapeño or the type of chili used to suit your taste. Enjoy responsibly! |
|
``` |
|
|
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
|
|
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_dfurman__CalmeRys-78B-Orpo-v0.1) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |50.78| |
|
|IFEval (0-Shot) |81.63| |
|
|BBH (3-Shot) |61.92| |
|
|MATH Lvl 5 (4-Shot)|37.92| |
|
|GPQA (0-shot) |20.02| |
|
|MuSR (0-shot) |36.37| |
|
|MMLU-PRO (5-shot) |66.80| |
|
|
|
|