File size: 6,968 Bytes
97a7051
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- unsloth
- generated_from_trainer
base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
model-index:
- name: dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dpo

This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct-bnb-4bit](https://huggingface.co./unsloth/llama-3-8b-Instruct-bnb-4bit) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6257
- Rewards/chosen: 0.8141
- Rewards/rejected: 0.4945
- Rewards/accuracies: 0.6431
- Rewards/margins: 0.3196
- Logps/rejected: -229.7856
- Logps/chosen: -249.2073
- Logits/rejected: -0.6789
- Logits/chosen: -0.6135

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 0
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 750
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6904        | 0.0372 | 28   | 0.6811          | 0.2766         | 0.2476           | 0.5770             | 0.0290          | -232.2545      | -254.5816    | -0.5471         | -0.5010       |
| 0.6591        | 0.0745 | 56   | 0.6623          | 0.9939         | 0.8694           | 0.5927             | 0.1245          | -226.0365      | -247.4085    | -0.5351         | -0.4798       |
| 0.6297        | 0.1117 | 84   | 0.6542          | 1.1966         | 0.9862           | 0.6136             | 0.2104          | -224.8689      | -245.3818    | -0.4689         | -0.4120       |
| 0.5985        | 0.1489 | 112  | 0.6540          | 1.5211         | 1.2525           | 0.6087             | 0.2687          | -222.2059      | -242.1367    | -0.4989         | -0.4262       |
| 0.6603        | 0.1862 | 140  | 0.6459          | 0.7737         | 0.5130           | 0.6304             | 0.2607          | -229.6009      | -249.6110    | -0.5779         | -0.5054       |
| 0.619         | 0.2234 | 168  | 0.6411          | 0.9352         | 0.6917           | 0.6222             | 0.2435          | -227.8137      | -247.9963    | -0.5842         | -0.5261       |
| 0.6497        | 0.2606 | 196  | 0.6427          | 0.8696         | 0.6404           | 0.6282             | 0.2292          | -228.3268      | -248.6518    | -0.5798         | -0.5255       |
| 0.6014        | 0.2979 | 224  | 0.6397          | 0.8941         | 0.6357           | 0.6263             | 0.2583          | -228.3730      | -248.4069    | -0.6397         | -0.5816       |
| 0.594         | 0.3351 | 252  | 0.6361          | 0.7069         | 0.4027           | 0.6319             | 0.3043          | -230.7038      | -250.2785    | -0.6434         | -0.5848       |
| 0.5898        | 0.3723 | 280  | 0.6356          | 1.0373         | 0.7462           | 0.6278             | 0.2911          | -227.2686      | -246.9745    | -0.6340         | -0.5714       |
| 0.639         | 0.4096 | 308  | 0.6342          | 0.7199         | 0.4321           | 0.6342             | 0.2878          | -230.4095      | -250.1490    | -0.6956         | -0.6293       |
| 0.6289        | 0.4468 | 336  | 0.6363          | 0.4299         | 0.1879           | 0.6248             | 0.2420          | -232.8515      | -253.0488    | -0.6705         | -0.6155       |
| 0.6304        | 0.4840 | 364  | 0.6321          | 0.7719         | 0.5053           | 0.6435             | 0.2667          | -229.6779      | -249.6284    | -0.6279         | -0.5652       |
| 0.6126        | 0.5213 | 392  | 0.6325          | 0.5194         | 0.2033           | 0.6375             | 0.3161          | -232.6973      | -252.1539    | -0.6785         | -0.6117       |
| 0.5974        | 0.5585 | 420  | 0.6254          | 0.7418         | 0.4269           | 0.6428             | 0.3149          | -230.4618      | -249.9303    | -0.6823         | -0.6170       |
| 0.6185        | 0.5957 | 448  | 0.6267          | 0.9534         | 0.6106           | 0.6409             | 0.3428          | -228.6247      | -247.8141    | -0.6532         | -0.5866       |
| 0.604         | 0.6330 | 476  | 0.6284          | 0.8011         | 0.4691           | 0.6394             | 0.3320          | -230.0398      | -249.3374    | -0.6842         | -0.6177       |
| 0.6154        | 0.6702 | 504  | 0.6269          | 0.8353         | 0.5307           | 0.6431             | 0.3046          | -229.4234      | -248.9947    | -0.6705         | -0.6051       |
| 0.5936        | 0.7074 | 532  | 0.6277          | 0.7287         | 0.4206           | 0.6469             | 0.3082          | -230.5248      | -250.0604    | -0.6887         | -0.6226       |
| 0.6291        | 0.7447 | 560  | 0.6260          | 0.8539         | 0.5327           | 0.6439             | 0.3211          | -229.4030      | -248.8091    | -0.6758         | -0.6096       |
| 0.6169        | 0.7819 | 588  | 0.6255          | 0.8797         | 0.5669           | 0.6461             | 0.3127          | -229.0613      | -248.5513    | -0.6690         | -0.6041       |
| 0.5934        | 0.8191 | 616  | 0.6256          | 0.8582         | 0.5399           | 0.6461             | 0.3183          | -229.3312      | -248.7658    | -0.6753         | -0.6095       |
| 0.6004        | 0.8564 | 644  | 0.6257          | 0.8263         | 0.5074           | 0.6450             | 0.3189          | -229.6564      | -249.0845    | -0.6761         | -0.6110       |
| 0.6282        | 0.8936 | 672  | 0.6256          | 0.8133         | 0.4949           | 0.6442             | 0.3184          | -229.7819      | -249.2152    | -0.6748         | -0.6101       |
| 0.5572        | 0.9309 | 700  | 0.6258          | 0.8122         | 0.4938           | 0.6442             | 0.3184          | -229.7925      | -249.2255    | -0.6781         | -0.6129       |
| 0.595         | 0.9681 | 728  | 0.6256          | 0.8140         | 0.4943           | 0.6428             | 0.3197          | -229.7873      | -249.2078    | -0.6788         | -0.6134       |


### Framework versions

- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1