Beanpow commited on
Commit
e49290e
·
verified ·
1 Parent(s): f2a4763

Model save

Browse files
README.md CHANGED
@@ -2,15 +2,9 @@
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - dpo
8
  - generated_from_trainer
9
- - trl
10
- - dpo
11
- - generated_from_trainer
12
- datasets:
13
- - HuggingFaceH4/ultrafeedback_binarized
14
  model-index:
15
  - name: zephyr-7b-dpo-full
16
  results: []
@@ -21,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
21
 
22
  # zephyr-7b-dpo-full
23
 
24
- This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.4904
27
- - Rewards/chosen: -1.5809
28
- - Rewards/rejected: -2.6285
29
- - Rewards/accuracies: 0.7698
30
- - Rewards/margins: 1.0476
31
- - Logps/rejected: -523.0543
32
- - Logps/chosen: -440.0590
33
- - Logits/rejected: 0.7096
34
- - Logits/chosen: -0.3600
35
 
36
  ## Model description
37
 
@@ -68,15 +62,15 @@ The following hyperparameters were used during training:
68
 
69
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
70
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
71
- | 0.6267 | 0.1047 | 100 | 0.6375 | 0.0143 | -0.1443 | 0.7063 | 0.1587 | -274.6408 | -280.5349 | -2.4244 | -2.4989 |
72
- | 0.578 | 0.2093 | 200 | 0.5544 | -0.7566 | -1.3732 | 0.7480 | 0.6167 | -397.5314 | -357.6234 | -1.3956 | -1.7000 |
73
- | 0.5301 | 0.3140 | 300 | 0.5198 | -0.9545 | -1.7880 | 0.7758 | 0.8335 | -439.0061 | -377.4185 | -0.0799 | -0.7774 |
74
- | 0.5386 | 0.4186 | 400 | 0.5105 | -1.3034 | -2.1080 | 0.7718 | 0.8046 | -471.0027 | -412.3046 | 0.9407 | 0.0250 |
75
- | 0.4996 | 0.5233 | 500 | 0.4953 | -1.7843 | -2.7473 | 0.7679 | 0.9630 | -534.9354 | -460.4000 | 1.1533 | 0.0059 |
76
- | 0.4664 | 0.6279 | 600 | 0.4944 | -1.6883 | -2.7073 | 0.7698 | 1.0191 | -530.9401 | -450.7948 | 1.0557 | -0.2689 |
77
- | 0.4716 | 0.7326 | 700 | 0.4929 | -1.5680 | -2.6003 | 0.7778 | 1.0324 | -520.2406 | -438.7653 | 0.7091 | -0.3818 |
78
- | 0.4741 | 0.8373 | 800 | 0.4907 | -1.5504 | -2.5743 | 0.7778 | 1.0239 | -517.6415 | -437.0119 | 0.6599 | -0.3941 |
79
- | 0.4741 | 0.9419 | 900 | 0.4906 | -1.5814 | -2.6280 | 0.7698 | 1.0467 | -523.0108 | -440.1050 | 0.7152 | -0.3568 |
80
 
81
 
82
  ### Framework versions
 
2
  license: apache-2.0
3
  base_model: alignment-handbook/zephyr-7b-sft-full
4
  tags:
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
 
 
 
 
 
8
  model-index:
9
  - name: zephyr-7b-dpo-full
10
  results: []
 
15
 
16
  # zephyr-7b-dpo-full
17
 
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.3374
21
+ - Rewards/chosen: -14.5395
22
+ - Rewards/rejected: -18.1313
23
+ - Rewards/accuracies: 0.6409
24
+ - Rewards/margins: 3.5918
25
+ - Logps/rejected: -2073.3389
26
+ - Logps/chosen: -1735.9152
27
+ - Logits/rejected: -0.6740
28
+ - Logits/chosen: -1.0018
29
 
30
  ## Model description
31
 
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 4.5854 | 0.1047 | 100 | 4.3811 | -0.2719 | -0.4992 | 0.6488 | 0.2273 | -310.1242 | -309.1552 | -2.1923 | -2.2813 |
66
+ | 2.6464 | 0.2093 | 200 | 2.6063 | -9.6247 | -11.6315 | 0.625 | 2.0068 | -1423.3580 | -1244.4360 | 0.6982 | -0.3562 |
67
+ | 1.9069 | 0.3140 | 300 | 2.2624 | -9.8468 | -11.9256 | 0.6329 | 2.0788 | -1452.7675 | -1266.6490 | 1.5569 | 0.4590 |
68
+ | 1.6642 | 0.4186 | 400 | 1.6421 | -14.4918 | -17.8494 | 0.625 | 3.3576 | -2045.1493 | -1731.1526 | -0.0875 | -0.7751 |
69
+ | 1.6328 | 0.5233 | 500 | 1.5120 | -13.0737 | -16.3036 | 0.6389 | 3.2299 | -1890.5623 | -1589.3370 | -0.0918 | -0.6590 |
70
+ | 1.6032 | 0.6279 | 600 | 1.4752 | -17.3374 | -21.4238 | 0.6230 | 4.0864 | -2402.5845 | -2015.7072 | 0.6402 | 0.0190 |
71
+ | 1.5039 | 0.7326 | 700 | 1.3853 | -14.1299 | -17.5624 | 0.6528 | 3.4325 | -2016.4491 | -1694.9624 | -0.4968 | -0.8898 |
72
+ | 1.3527 | 0.8373 | 800 | 1.3663 | -13.9016 | -17.2583 | 0.6448 | 3.3567 | -1986.0359 | -1672.1306 | -0.6750 | -1.0375 |
73
+ | 1.5137 | 0.9419 | 900 | 1.3374 | -14.5395 | -18.1313 | 0.6409 | 3.5918 | -2073.3389 | -1735.9152 | -0.6740 | -1.0018 |
74
 
75
 
76
  ### Framework versions
all_results.json CHANGED
@@ -1,22 +1,9 @@
1
  {
2
  "epoch": 0.9994767137624281,
3
- "eval_logits/chosen": -0.35997089743614197,
4
- "eval_logits/rejected": 0.7096247673034668,
5
- "eval_logps/chosen": -440.0589599609375,
6
- "eval_logps/rejected": -523.0543212890625,
7
- "eval_loss": 0.4903615117073059,
8
- "eval_rewards/accuracies": 0.7698412537574768,
9
- "eval_rewards/chosen": -1.5809098482131958,
10
- "eval_rewards/margins": 1.0475648641586304,
11
- "eval_rewards/rejected": -2.6284749507904053,
12
- "eval_runtime": 176.963,
13
- "eval_samples": 2000,
14
- "eval_samples_per_second": 11.302,
15
- "eval_steps_per_second": 0.356,
16
  "total_flos": 0.0,
17
- "train_loss": 0.5247097397349891,
18
- "train_runtime": 18219.8315,
19
  "train_samples": 61134,
20
- "train_samples_per_second": 3.355,
21
- "train_steps_per_second": 0.052
22
  }
 
1
  {
2
  "epoch": 0.9994767137624281,
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "total_flos": 0.0,
4
+ "train_loss": 2.1165736393154604,
5
+ "train_runtime": 18133.1885,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 3.371,
8
+ "train_steps_per_second": 0.053
9
  }
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bc27b979262e05c38af1babd0a4a1d2741b1c9eb96d39e73f9768fa7eb5c3bb
3
  size 4943162336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d01b4522d4331954513b741ee229db1ee50faf8180950d2d7b946855cdc3515
3
  size 4943162336
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8563e99eeea3128a306b0d0ebcbdd4223b20a4f83ae5ce6c88a293c5e10b96c2
3
  size 4999819336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9d70ede8f353fbc790dfdd38530b55b9faebfe3d7ab5a06afc531f4dc531553
3
  size 4999819336
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dd18869baf9c51e0521149959977886cf788ec73ec4d5b01c4799c9d08c6ddee
3
  size 4540516344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2ec0eb8794a7b89aac7eb2e3ef3fa32dde10b0bae20f9ab2fd46b776d16e913
3
  size 4540516344
runs/Apr26_05-14-17_dsw-3393-565b8f59b6-l5p7k/events.out.tfevents.1714108924.dsw-3393-565b8f59b6-l5p7k.40537.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8f3a30a56d8a7df55ae8df61b8bf0c7ab25928b2fe8806ee73a64eda7ab364fe
3
- size 70443
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ca7d85abffa48f034963c09db20b166c6eaf7f43f9224e061784a0d538dacef
3
+ size 74119
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 0.9994767137624281,
3
  "total_flos": 0.0,
4
- "train_loss": 0.5247097397349891,
5
- "train_runtime": 18219.8315,
6
  "train_samples": 61134,
7
- "train_samples_per_second": 3.355,
8
- "train_steps_per_second": 0.052
9
  }
 
1
  {
2
  "epoch": 0.9994767137624281,
3
  "total_flos": 0.0,
4
+ "train_loss": 2.1165736393154604,
5
+ "train_runtime": 18133.1885,
6
  "train_samples": 61134,
7
+ "train_samples_per_second": 3.371,
8
+ "train_steps_per_second": 0.053
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff