|
--- |
|
license: llama3 |
|
tags: |
|
- axolotl |
|
- dpo |
|
- trl |
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct |
|
datasets: |
|
- HumanLLMs/Human-Like-DPO-Dataset |
|
model-index: |
|
- name: Humanish-LLama3.1-8B-Instruct |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 64.98 |
|
name: strict accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 28.01 |
|
name: normalized accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 8.46 |
|
name: exact match |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 0.78 |
|
name: acc_norm |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 2 |
|
name: acc_norm |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 30.02 |
|
name: accuracy |
|
source: |
|
url: >- |
|
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
name: Open LLM Leaderboard |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
--- |
|
<div align="center"> |
|
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" /> |
|
<h1>Enhancing Human-Like Responses in Large Language Models</h1> |
|
</div> |
|
|
|
<p align="center"> |
|
   | 🤗 <a href="https://huggingface.co./collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>   | |
|
   📊 <a href="https://huggingface.co./datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>   | |
|
   📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>   | |
|
</p> |
|
|
|
# 🚀 Human-Like-Llama3-8B-Instruct |
|
|
|
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3-8B-Instruct), specifically optimized to generate more human-like and conversational responses. |
|
|
|
The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions. |
|
|
|
The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032). |
|
|
|
# 🛠️ Training Configuration |
|
|
|
- **Base Model:** Llama3-8B-Instruct |
|
- **Framework:** Axolotl v0.4.1 |
|
- **Hardware:** 2x NVIDIA A100 (80 GB) GPUs |
|
- **Training Time:** ~2 hours 20 minutes |
|
- **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics |
|
|
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.4.1` |
|
```yaml |
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct |
|
model_type: LlamaForCausalLM |
|
tokenizer_type: AutoTokenizer |
|
|
|
load_in_8bit: true |
|
load_in_4bit: false |
|
strict: false |
|
|
|
chat_template: llama3 |
|
rl: dpo |
|
datasets: |
|
- path: HumanLLMs/humanish-dpo-project |
|
type: llama3.prompt_pairs |
|
chat_template: llama3 |
|
|
|
dataset_prepared_path: |
|
val_set_size: 0.05 |
|
output_dir: ./humanish-llama3-8b-instruct |
|
|
|
sequence_len: 8192 |
|
sample_packing: false |
|
pad_to_sequence_len: true |
|
|
|
adapter: lora |
|
lora_model_dir: |
|
lora_r: 8 |
|
lora_alpha: 4 |
|
lora_dropout: 0.05 |
|
lora_target_linear: true |
|
lora_fan_in_fan_out: |
|
|
|
wandb_project: Humanish-DPO |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: |
|
wandb_log_model: |
|
|
|
hub_model_id: HumanLLMs/Humanish-LLama3.1-8B-Instruct |
|
|
|
gradient_accumulation_steps: 8 |
|
micro_batch_size: 2 |
|
num_epochs: 1 |
|
optimizer: adamw_bnb_8bit |
|
lr_scheduler: cosine |
|
learning_rate: 0.0002 |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: auto |
|
fp16: |
|
tf32: false |
|
|
|
gradient_checkpointing: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
s2_attention: |
|
|
|
warmup_steps: 10 |
|
evals_per_epoch: 2 |
|
eval_table_size: |
|
eval_max_new_tokens: 128 |
|
saves_per_epoch: 1 |
|
debug: |
|
deepspeed: |
|
weight_decay: 0.0 |
|
fsdp: |
|
fsdp_config: |
|
|
|
save_safetensors: true |
|
|
|
``` |
|
|
|
</details><br> |
|
|
|
# 💬 Prompt Template |
|
|
|
You can use Llama3 prompt template while using the model: |
|
|
|
### Llama3 |
|
|
|
``` |
|
<|start_header_id|>system<|end_header_id|> |
|
{system}<|eot_id|> |
|
|
|
<|start_header_id|>user<|end_header_id|> |
|
{user}<|eot_id|> |
|
|
|
<|start_header_id|>assistant<|end_header_id|> |
|
{assistant}<|eot_id|> |
|
``` |
|
|
|
This prompt template is available as a [chat template](https://huggingface.co./docs/transformers/main/chat_templating), which means you can format messages using the |
|
`tokenizer.apply_chat_template()` method: |
|
|
|
```python |
|
messages = [ |
|
{"role": "system", "content": "You are helpful AI asistant."}, |
|
{"role": "user", "content": "Hello!"} |
|
] |
|
gen_input = tokenizer.apply_chat_template(message, return_tensors="pt") |
|
model.generate(**gen_input) |
|
``` |
|
|
|
# 🤖 Models |
|
|
|
| Model | Download | |
|
|:---------------------:|:-----------------------------------------------------------------------:| |
|
| Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co./HumanLLMs/Human-Like-LLama3-8B-Instruct) | |
|
| Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co./HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) | |
|
| Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co./HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) | |
|
|
|
# 🎯 Benchmark Results |
|
|
|
| **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** | |
|
|--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------| |
|
| **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 | |
|
| | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 | |
|
| | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 | |
|
| **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 | |
|
| | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 | |
|
| | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** | |
|
| **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 | |
|
| | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 | |
|
| | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 | |
|
|
|
|
|
# 📊 Dataset |
|
|
|
The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of: |
|
|
|
- **Human-like responses:** Natural, conversational answers mimicking human dialogue. |
|
- **Formal responses:** Structured and precise answers with a more formal tone. |
|
|
|
The dataset has been open-sourced and is available at: |
|
|
|
- 👉 [Human-Like-DPO-Dataset](https://huggingface.co./datasets/HumanLLMs/Human-Like-DPO-Dataset) |
|
|
|
More details on the dataset creation process can be found in the accompanying research paper. |
|
|
|
# 📝 Citation |
|
|
|
``` |
|
@misc{çalık2025enhancinghumanlikeresponseslarge, |
|
title={Enhancing Human-Like Responses in Large Language Models}, |
|
author={Ethem Yağız Çalık and Talha Rüzgar Akkuş}, |
|
year={2025}, |
|
eprint={2501.05032}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2501.05032}, |
|
} |
|
``` |