|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
base_model: |
|
- nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated |
|
datasets: |
|
- nbeerbower/Schule-DPO |
|
- nbeerbower/Arkhaios-DPO |
|
- nbeerbower/Purpura-DPO |
|
--- |
|
|
|
![image/png](https://huggingface.co./nbeerbower/mistral-nemo-kartoffel-12B/resolve/main/kartoffel.png?download=true) |
|
|
|
# mistral-nemo-kartoffel-12B |
|
|
|
[Mahou-1.5-mistral-nemo-12B-lorablated](https://huggingface.co./nbeerbower/Mahou-1.5-mistral-nemo-12B-lorablated) finetuned on various datasets. |
|
|
|
### Method |
|
|
|
[ORPO tuned](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html) with 8x A100 for 2 epochs. |
|
|
|
QLoRA config: |
|
``` |
|
# QLoRA config |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch_dtype, |
|
bnb_4bit_use_double_quant=True, |
|
) |
|
|
|
# LoRA config |
|
peft_config = LoraConfig( |
|
r=16, |
|
lora_alpha=32, |
|
lora_dropout=0.05, |
|
bias="none", |
|
task_type="CAUSAL_LM", |
|
target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj'] |
|
) |
|
``` |
|
|
|
Training config: |
|
``` |
|
orpo_args = ORPOConfig( |
|
run_name=new_model, |
|
learning_rate=8e-6, |
|
lr_scheduler_type="linear", |
|
max_length=2048, |
|
max_prompt_length=1024, |
|
max_completion_length=1024, |
|
beta=0.1, |
|
per_device_train_batch_size=4, |
|
per_device_eval_batch_size=4, |
|
gradient_accumulation_steps=1, |
|
optim="paged_adamw_8bit", |
|
num_train_epochs=2, |
|
evaluation_strategy="steps", |
|
eval_steps=0.2, |
|
logging_steps=1, |
|
warmup_steps=10, |
|
max_grad_norm=10, |
|
report_to="wandb", |
|
output_dir="./results/", |
|
bf16=True, |
|
gradient_checkpointing=True, |
|
) |
|
``` |
|
|