|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# Model Card for decruz07/kellemar-DPO-7B-e |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
Learning Rate: 5e-5, steps 300 |
|
## Model Details |
|
|
|
Created with beta = 0.05 |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** @decruz |
|
- **Funded by [optional]:** my full-time job |
|
- **Finetuned from model [optional]:** teknium/OpenHermes-2.5-Mistral-7B |
|
|
|
|
|
|
|
## Uses |
|
|
|
You can use this for basic inference. You could probably finetune with this if you want to. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
You can create a space out of this, or use basic python code to call the model directly and make inferences to it. |
|
|
|
[More Information Needed] |
|
|
|
## Training Details |
|
|
|
The following was used: |
|
`training_args = TrainingArguments( |
|
per_device_train_batch_size=4, |
|
gradient_accumulation_steps=4, |
|
gradient_checkpointing=True, |
|
learning_rate=5e-5, |
|
lr_scheduler_type="cosine", |
|
max_steps=200, |
|
save_strategy="no", |
|
logging_steps=1, |
|
output_dir=new_model, |
|
optim="paged_adamw_32bit", |
|
warmup_steps=100, |
|
bf16=True, |
|
report_to="wandb", |
|
) |
|
|
|
# Create DPO trainer |
|
dpo_trainer = DPOTrainer( |
|
model, |
|
ref_model, |
|
args=training_args, |
|
train_dataset=dataset, |
|
tokenizer=tokenizer, |
|
peft_config=peft_config, |
|
beta=0.1, |
|
max_prompt_length=1024, |
|
max_length=1536, |
|
)` |
|
|
|
### Training Data |
|
|
|
This was trained with https://huggingface.co./datasets/argilla/distilabel-intel-orca-dpo-pairs |
|
|
|
### Training Procedure |
|
|
|
Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO. |
|
|
|
## Model Card Authors [optional] |
|
|
|
@decruz |
|
|
|
## Model Card Contact |
|
|
|
@decruz on X/Twitter |