flammen15-gutenberg-DPO-v1-7B
A Mistral 7B LLM built from merging pretrained models and finetuning on Jon Durbin's Gutenberg DPO set. Flammen specializes in exceptional character roleplay, creative writing, and general intelligence
Method
Finetuned using an A100 on Google Colab. (plz give more gpu)
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
learning_rate=2e-5,
lr_scheduler_type="cosine",
max_steps=200,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=1024,
max_length=1536,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 21.46 |
IFEval (0-Shot) | 47.98 |
BBH (3-Shot) | 32.67 |
MATH Lvl 5 (4-Shot) | 6.72 |
GPQA (0-shot) | 4.59 |
MuSR (0-shot) | 12.53 |
MMLU-PRO (5-shot) | 24.29 |
- Downloads last month
- 49
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for flammenai/flammen15-gutenberg-DPO-v1-7B
Base model
flammenai/flammen15-mistral-7BDataset used to train flammenai/flammen15-gutenberg-DPO-v1-7B
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard47.980
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard32.670
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard6.720
- acc_norm on GPQA (0-shot)Open LLM Leaderboard4.590
- acc_norm on MuSR (0-shot)Open LLM Leaderboard12.530
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard24.290