---
license: llama3.1
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model-index:
- name: llama3.1_8b_dpo_bwgenerator_test2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama3.1_8b_dpo_bwgenerator_test2

This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3.1-8B-Instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0977
- Rewards/chosen: -12.6599
- Rewards/rejected: -40.7969
- Rewards/accuracies: 0.9925
- Rewards/margins: 28.1371
- Logps/rejected: -518.0425
- Logps/chosen: -211.2417
- Logits/rejected: -1.2740
- Logits/chosen: -1.8549

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.065         | 0.2396 | 1000 | 0.1978          | -24.9705       | -63.8717         | 0.9881             | 38.9011         | -748.7902      | -334.3486    | -1.2031         | -1.8039       |
| 0.0803        | 0.4793 | 2000 | 0.1339          | -16.1506       | -46.8097         | 0.9925             | 30.6591         | -578.1700      | -246.1489    | -1.2214         | -1.8149       |
| 0.0588        | 0.7189 | 3000 | 0.1012          | -12.8597       | -41.3756         | 0.9925             | 28.5159         | -523.8289      | -213.2401    | -1.2775         | -1.8541       |
| 0.0422        | 0.9585 | 4000 | 0.0977          | -12.6599       | -40.7969         | 0.9925             | 28.1371         | -518.0425      | -211.2417    | -1.2740         | -1.8549       |


### Framework versions

- PEFT 0.10.0
- Transformers 4.44.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.7
- Tokenizers 0.19.1