leaderboard-pr-bot's picture
Adding Evaluation Results
f45ca1b verified
|
raw
history blame
8 kB
---
license: apache-2.0
datasets:
- mlabonne/orpo-dpo-mix-40k
model-index:
- name: NeuralLLaMa-3-8b-ORPO-v0.3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 69.54
name: normalized accuracy
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 84.9
name: normalized accuracy
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 68.39
name: accuracy
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 60.82
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 79.4
name: accuracy
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 72.93
name: accuracy
source:
url: https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 52.76
name: strict accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 22.39
name: normalized accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 3.47
name: exact match
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 0.0
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.65
name: acc_norm
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 22.85
name: accuracy
source:
url: https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3
name: Open LLM Leaderboard
---
# NeuralLLaMa-3-8b-ORPO-v0.3
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d71ab4089bc502ceb44d29/JyQNE7gAAyYTxKMO2PraO.png)
```python
!pip install -qU transformers accelerate bitsandbytes
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
MODEL_NAME = 'Kukedlc/NeuralLLaMa-3-8b-ORPO-v0.3'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='cuda:0', quantization_config=bnb_config)
prompt_system = "Sos un modelo de lenguaje de avanzada que habla español de manera fluida, clara y precisa.\
Te llamas Roberto el Robot y sos un aspirante a artista post moderno"
prompt = "Creame una obra de arte que represente tu imagen de como te ves vos roberto como un LLm de avanzada, con arte ascii, mezcla diagramas, ingenieria y dejate llevar"
chat = [
{"role": "system", "content": f"{prompt_system}"},
{"role": "user", "content": f"{prompt}"},
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to('cuda')
streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=1024, do_sample=True, temperature=0.3, repetition_penalty=1.2, top_p=0.9,)
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_Kukedlc__NeuralLLaMa-3-8b-ORPO-v0.3)
| Metric |Value|
|---------------------------------|----:|
|Avg. |72.66|
|AI2 Reasoning Challenge (25-Shot)|69.54|
|HellaSwag (10-Shot) |84.90|
|MMLU (5-Shot) |68.39|
|TruthfulQA (0-shot) |60.82|
|Winogrande (5-shot) |79.40|
|GSM8k (5-shot) |72.93|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_Kukedlc__NeuralLLaMa-3-8b-ORPO-v0.3)
| Metric |Value|
|-------------------|----:|
|Avg. |17.52|
|IFEval (0-Shot) |52.76|
|BBH (3-Shot) |22.39|
|MATH Lvl 5 (4-Shot)| 3.47|
|GPQA (0-shot) | 0.00|
|MuSR (0-shot) | 3.65|
|MMLU-PRO (5-shot) |22.85|