|
--- |
|
base_model: unsloth/Meta-Llama-3.1-8B-bnb-4bit |
|
language: |
|
- en |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
- sft |
|
- dpo |
|
license: llama3.1 |
|
datasets: |
|
- reciperesearch/dolphin-sft-v0.1-preference |
|
pipeline_tag: text-generation |
|
license_name: llama3 |
|
license_link: LICENSE |
|
model_creator: EpistemeAI |
|
quantized_by: EpistemeAI |
|
--- |
|
|
|
**gguf**: |
|
- q4_k_m |
|
- 16-bit |
|
|
|
|
|
This model is based on Meta Llama 3.1 8b, and is governed by the Llama 3.1 license. |
|
|
|
Fine-tune using ORPO |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
- dataset: reciperesearch/dolphin-sft-v0.1-preference |
|
|
|
### Training Procedure |
|
ORPO techniques |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
|
|
|
|
TrainOutput(global_step=30, training_loss=4.25380277633667, metrics={'train_runtime': 679.3467, 'train_samples_per_second': 0.353, 'train_steps_per_second': 0.044, 'total_flos': 0.0, 'train_loss': 4.25380277633667, 'epoch': 0.015}) |