---
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
- generated_from_trainer
model-index:
- name: Llama-2-7b-dpo-10k
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Llama-2-7b-dpo-10k

This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co./meta-llama/Llama-2-7b-hf) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7215
- Rewards/real: 5.3782
- Rewards/generated: 4.9113
- Rewards/accuracies: 0.6923
- Rewards/margins: 0.4668
- Logps/generated: -113.1980
- Logps/real: -125.7774
- Logits/generated: -1.1385
- Logits/real: -1.0466

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
|:-------------:|:------:|:----:|:---------------:|:------------:|:-----------------:|:------------------:|:---------------:|:---------------:|:----------:|:----------------:|:-----------:|
| 0.8559        | 0.1984 | 62   | 0.8605          | 0.4128       | 0.4099            | 0.4808             | 0.0029          | -158.2126       | -175.4314  | -0.8219          | -0.6123     |
| 0.7999        | 0.3968 | 124  | 0.8323          | 1.5863       | 1.5154            | 0.5192             | 0.0709          | -147.1573       | -163.6966  | -0.8057          | -0.6067     |
| 0.7846        | 0.5952 | 186  | 0.7979          | 2.4470       | 2.3135            | 0.5577             | 0.1335          | -139.1767       | -155.0893  | -0.8686          | -0.6862     |
| 0.7916        | 0.7936 | 248  | 0.7819          | 3.0117       | 2.8464            | 0.6346             | 0.1653          | -133.8475       | -149.4422  | -0.9049          | -0.7322     |
| 0.7714        | 0.992  | 310  | 0.7630          | 3.4214       | 3.1941            | 0.6346             | 0.2273          | -130.3704       | -145.3455  | -0.9511          | -0.7905     |
| 0.678         | 1.1904 | 372  | 0.7552          | 3.9523       | 3.6931            | 0.6538             | 0.2592          | -125.3802       | -140.0360  | -0.9800          | -0.8279     |
| 0.6337        | 1.3888 | 434  | 0.7464          | 4.4541       | 4.1602            | 0.6346             | 0.2939          | -120.7093       | -135.0177  | -1.0279          | -0.8860     |
| 0.6575        | 1.5872 | 496  | 0.7352          | 4.8501       | 4.4918            | 0.6538             | 0.3583          | -117.3935       | -131.0585  | -1.0562          | -0.9285     |
| 0.6606        | 1.7856 | 558  | 0.7270          | 5.1119       | 4.7485            | 0.6538             | 0.3634          | -114.8267       | -128.4403  | -1.0969          | -0.9780     |
| 0.6319        | 1.984  | 620  | 0.7260          | 5.2581       | 4.8563            | 0.6538             | 0.4018          | -113.7479       | -126.9782  | -1.0953          | -0.9815     |
| 0.552         | 2.1824 | 682  | 0.7295          | 5.3469       | 4.9377            | 0.6731             | 0.4092          | -112.9344       | -126.0898  | -1.1133          | -1.0072     |
| 0.5541        | 2.3808 | 744  | 0.7229          | 5.4093       | 4.9819            | 0.6923             | 0.4274          | -112.4924       | -125.4664  | -1.1322          | -1.0330     |
| 0.5342        | 2.5792 | 806  | 0.7246          | 5.3967       | 4.9520            | 0.6923             | 0.4447          | -112.7909       | -125.5919  | -1.1353          | -1.0397     |
| 0.5318        | 2.7776 | 868  | 0.7229          | 5.3656       | 4.9040            | 0.6731             | 0.4615          | -113.2710       | -125.9033  | -1.1367          | -1.0427     |
| 0.5396        | 2.976  | 930  | 0.7215          | 5.3782       | 4.9113            | 0.6923             | 0.4668          | -113.1980       | -125.7774  | -1.1385          | -1.0466     |


### Framework versions

- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1