Model save

Browse files

Files changed (8) hide show

README.md +19 -25
all_results.json +4 -17
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
runs/Apr26_05-14-17_dsw-3393-565b8f59b6-l5p7k/events.out.tfevents.1714108924.dsw-3393-565b8f59b6-l5p7k.40537.0 +2 -2
train_results.json +4 -4
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -2,15 +2,9 @@
 license: apache-2.0
 base_model: alignment-handbook/zephyr-7b-sft-full
 tags:
-- alignment-handbook
 - trl
 - dpo
 - generated_from_trainer
-- trl
-- dpo
-- generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b-dpo-full
   results: []
@@ -21,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-dpo-full
-This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4904
-- Rewards/chosen: -1.5809
-- Rewards/rejected: -2.6285
-- Rewards/accuracies: 0.7698
-- Rewards/margins: 1.0476
-- Logps/rejected: -523.0543
-- Logps/chosen: -440.0590
-- Logits/rejected: 0.7096
-- Logits/chosen: -0.3600
 ## Model description
@@ -68,15 +62,15 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6267        | 0.1047 | 100  | 0.6375          | 0.0143         | -0.1443          | 0.7063             | 0.1587          | -274.6408      | -280.5349    | -2.4244         | -2.4989       |
-| 0.578         | 0.2093 | 200  | 0.5544          | -0.7566        | -1.3732          | 0.7480             | 0.6167          | -397.5314      | -357.6234    | -1.3956         | -1.7000       |
-| 0.5301        | 0.3140 | 300  | 0.5198          | -0.9545        | -1.7880          | 0.7758             | 0.8335          | -439.0061      | -377.4185    | -0.0799         | -0.7774       |
-| 0.5386        | 0.4186 | 400  | 0.5105          | -1.3034        | -2.1080          | 0.7718             | 0.8046          | -471.0027      | -412.3046    | 0.9407          | 0.0250        |
-| 0.4996        | 0.5233 | 500  | 0.4953          | -1.7843        | -2.7473          | 0.7679             | 0.9630          | -534.9354      | -460.4000    | 1.1533          | 0.0059        |
-| 0.4664        | 0.6279 | 600  | 0.4944          | -1.6883        | -2.7073          | 0.7698             | 1.0191          | -530.9401      | -450.7948    | 1.0557          | -0.2689       |
-| 0.4716        | 0.7326 | 700  | 0.4929          | -1.5680        | -2.6003          | 0.7778             | 1.0324          | -520.2406      | -438.7653    | 0.7091          | -0.3818       |
-| 0.4741        | 0.8373 | 800  | 0.4907          | -1.5504        | -2.5743          | 0.7778             | 1.0239          | -517.6415      | -437.0119    | 0.6599          | -0.3941       |
-| 0.4741        | 0.9419 | 900  | 0.4906          | -1.5814        | -2.6280          | 0.7698             | 1.0467          | -523.0108      | -440.1050    | 0.7152          | -0.3568       |
 ### Framework versions

 license: apache-2.0
 base_model: alignment-handbook/zephyr-7b-sft-full
 tags:
 - trl
 - dpo
 - generated_from_trainer
 model-index:
 - name: zephyr-7b-dpo-full
   results: []
 # zephyr-7b-dpo-full
+This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.3374
+- Rewards/chosen: -14.5395
+- Rewards/rejected: -18.1313
+- Rewards/accuracies: 0.6409
+- Rewards/margins: 3.5918
+- Logps/rejected: -2073.3389
+- Logps/chosen: -1735.9152
+- Logits/rejected: -0.6740
+- Logits/chosen: -1.0018
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 4.5854        | 0.1047 | 100  | 4.3811          | -0.2719        | -0.4992          | 0.6488             | 0.2273          | -310.1242      | -309.1552    | -2.1923         | -2.2813       |
+| 2.6464        | 0.2093 | 200  | 2.6063          | -9.6247        | -11.6315         | 0.625              | 2.0068          | -1423.3580     | -1244.4360   | 0.6982          | -0.3562       |
+| 1.9069        | 0.3140 | 300  | 2.2624          | -9.8468        | -11.9256         | 0.6329             | 2.0788          | -1452.7675     | -1266.6490   | 1.5569          | 0.4590        |
+| 1.6642        | 0.4186 | 400  | 1.6421          | -14.4918       | -17.8494         | 0.625              | 3.3576          | -2045.1493     | -1731.1526   | -0.0875         | -0.7751       |
+| 1.6328        | 0.5233 | 500  | 1.5120          | -13.0737       | -16.3036         | 0.6389             | 3.2299          | -1890.5623     | -1589.3370   | -0.0918         | -0.6590       |
+| 1.6032        | 0.6279 | 600  | 1.4752          | -17.3374       | -21.4238         | 0.6230             | 4.0864          | -2402.5845     | -2015.7072   | 0.6402          | 0.0190        |
+| 1.5039        | 0.7326 | 700  | 1.3853          | -14.1299       | -17.5624         | 0.6528             | 3.4325          | -2016.4491     | -1694.9624   | -0.4968         | -0.8898       |
+| 1.3527        | 0.8373 | 800  | 1.3663          | -13.9016       | -17.2583         | 0.6448             | 3.3567          | -1986.0359     | -1672.1306   | -0.6750         | -1.0375       |
+| 1.5137        | 0.9419 | 900  | 1.3374          | -14.5395       | -18.1313         | 0.6409             | 3.5918          | -2073.3389     | -1735.9152   | -0.6740         | -1.0018       |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,22 +1,9 @@
 {
     "epoch": 0.9994767137624281,
-    "eval_logits/chosen": -0.35997089743614197,
-    "eval_logits/rejected": 0.7096247673034668,
-    "eval_logps/chosen": -440.0589599609375,
-    "eval_logps/rejected": -523.0543212890625,
-    "eval_loss": 0.4903615117073059,
-    "eval_rewards/accuracies": 0.7698412537574768,
-    "eval_rewards/chosen": -1.5809098482131958,
-    "eval_rewards/margins": 1.0475648641586304,
-    "eval_rewards/rejected": -2.6284749507904053,
-    "eval_runtime": 176.963,
-    "eval_samples": 2000,
-    "eval_samples_per_second": 11.302,
-    "eval_steps_per_second": 0.356,
     "total_flos": 0.0,
-    "train_loss": 0.5247097397349891,
-    "train_runtime": 18219.8315,
     "train_samples": 61134,
-    "train_samples_per_second": 3.355,
-    "train_steps_per_second": 0.052
 }

 {
     "epoch": 0.9994767137624281,
     "total_flos": 0.0,
+    "train_loss": 2.1165736393154604,
+    "train_runtime": 18133.1885,
     "train_samples": 61134,
+    "train_samples_per_second": 3.371,
+    "train_steps_per_second": 0.053
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9bc27b979262e05c38af1babd0a4a1d2741b1c9eb96d39e73f9768fa7eb5c3bb
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:0d01b4522d4331954513b741ee229db1ee50faf8180950d2d7b946855cdc3515
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8563e99eeea3128a306b0d0ebcbdd4223b20a4f83ae5ce6c88a293c5e10b96c2
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:b9d70ede8f353fbc790dfdd38530b55b9faebfe3d7ab5a06afc531f4dc531553
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dd18869baf9c51e0521149959977886cf788ec73ec4d5b01c4799c9d08c6ddee
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:e2ec0eb8794a7b89aac7eb2e3ef3fa32dde10b0bae20f9ab2fd46b776d16e913
 size 4540516344

runs/Apr26_05-14-17_dsw-3393-565b8f59b6-l5p7k/events.out.tfevents.1714108924.dsw-3393-565b8f59b6-l5p7k.40537.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8f3a30a56d8a7df55ae8df61b8bf0c7ab25928b2fe8806ee73a64eda7ab364fe
-size 70443

 version https://git-lfs.github.com/spec/v1
+oid sha256:3ca7d85abffa48f034963c09db20b166c6eaf7f43f9224e061784a0d538dacef
+size 74119

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9994767137624281,
     "total_flos": 0.0,
-    "train_loss": 0.5247097397349891,
-    "train_runtime": 18219.8315,
     "train_samples": 61134,
-    "train_samples_per_second": 3.355,
-    "train_steps_per_second": 0.052
 }

 {
     "epoch": 0.9994767137624281,
     "total_flos": 0.0,
+    "train_loss": 2.1165736393154604,
+    "train_runtime": 18133.1885,
     "train_samples": 61134,
+    "train_samples_per_second": 3.371,
+    "train_steps_per_second": 0.053
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff