File size: 5,012 Bytes

376e9eb
 
 
 
216bf68
376e9eb
 
 
 
216bf68
 
376e9eb
 
 
 
 
 
 
 
 
 
216bf68
376e9eb
216bf68
 
 
 
376e9eb
216bf68
 
376e9eb
216bf68
 
 
 
 
 
376e9eb

---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF6konly
model-index:
- name: zephyr-7b-dpop-uf6k-qlora-5e-7-epoch3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpop-uf6k-qlora-5e-7-epoch3

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co./alignment-handbook/zephyr-7b-sft-full) on the generation/UF6konly dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6877
- Positive Losses: 0.0653
- Dpo Losses: 0.6792
- Rewards/chosen: 0.0804
- Rewards/rejected: 0.0515
- Rewards/accuracies: 0.6786
- Rewards/margins: 0.0289
- Rewards/margins Max: 0.0914
- Rewards/margins Min: -0.0315
- Rewards/margins Std: 0.0551
- Logps/rejected: -254.0317
- Logps/chosen: -277.1849
- Logits/rejected: -2.8006
- Logits/chosen: -2.8458

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6932        | 0.3   | 100  | 0.6930          | 0.0052          | 0.6925     | 0.0096         | 0.0083           | 0.5794             | 0.0012          | 0.0066              | -0.0040             | 0.0047              | -258.3495      | -284.2642    | -2.8141         | -2.8594       |
| 0.6919        | 0.61  | 200  | 0.6915          | 0.0119          | 0.6904     | 0.0242         | 0.0187           | 0.6667             | 0.0055          | 0.0195              | -0.0083             | 0.0124              | -257.3141      | -282.7999    | -2.8133         | -2.8583       |
| 0.6903        | 0.91  | 300  | 0.6899          | 0.0165          | 0.6876     | 0.0395         | 0.0283           | 0.6667             | 0.0112          | 0.0379              | -0.0143             | 0.0232              | -256.3544      | -281.2695    | -2.8086         | -2.8537       |
| 0.6832        | 1.22  | 400  | 0.6892          | 0.0304          | 0.6847     | 0.0525         | 0.0351           | 0.7024             | 0.0174          | 0.0557              | -0.0196             | 0.0337              | -255.6741      | -279.9755    | -2.8057         | -2.8507       |
| 0.6776        | 1.52  | 500  | 0.6884          | 0.0444          | 0.6825     | 0.0647         | 0.0427           | 0.6905             | 0.0220          | 0.0710              | -0.0256             | 0.0433              | -254.9144      | -278.7508    | -2.8047         | -2.8495       |
| 0.677         | 1.82  | 600  | 0.6873          | 0.0459          | 0.6811     | 0.0769         | 0.0519           | 0.6825             | 0.0250          | 0.0803              | -0.0280             | 0.0484              | -253.9932      | -277.5360    | -2.8047         | -2.8494       |
| 0.6796        | 2.13  | 700  | 0.6872          | 0.0548          | 0.6800     | 0.0798         | 0.0526           | 0.6825             | 0.0272          | 0.0865              | -0.0298             | 0.0521              | -253.9202      | -277.2366    | -2.8026         | -2.8477       |
| 0.6778        | 2.43  | 800  | 0.6875          | 0.0604          | 0.6795     | 0.0800         | 0.0518           | 0.6825             | 0.0282          | 0.0897              | -0.0307             | 0.0540              | -254.0074      | -277.2222    | -2.8024         | -2.8474       |
| 0.6739        | 2.74  | 900  | 0.6878          | 0.0651          | 0.6793     | 0.0802         | 0.0515           | 0.6706             | 0.0287          | 0.0914              | -0.0317             | 0.0550              | -254.0345      | -277.2037    | -2.8028         | -2.8477       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2