enkhtogtokh/mistral-dpo

Browse files

Files changed (5) hide show

README.md +13 -13
adapter_config.json +2 -2
adapter_model.safetensors +1 -1
runs/Jan16_06-26-11_c6a9bcfdc196/events.out.tfevents.1705386478.c6a9bcfdc196.594.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0002
-- Rewards/chosen: 24.3722
-- Rewards/rejected: -6.8482
 - Rewards/accuracies: 1.0
-- Rewards/margins: 31.2205
-- Logps/rejected: -88.3652
-- Logps/chosen: -328.0153
-- Logits/rejected: -1.3660
-- Logits/chosen: -1.8558
 ## Model description
@@ -58,11 +58,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.5974        | 0.01  | 10   | 0.3220          | 2.6702         | -0.4084          | 0.9712             | 3.0786          | -23.9665       | -545.0353    | -1.2378         | -1.5696       |
-| 0.0879        | 0.02  | 20   | 0.0217          | 13.3520        | -3.0537          | 1.0                | 16.4057         | -50.4196       | -438.2177    | -1.2934         | -1.7050       |
-| 0.0413        | 0.03  | 30   | 0.0015          | 20.3280        | -5.4777          | 1.0                | 25.8057         | -74.6603       | -368.4581    | -1.3375         | -1.8145       |
-| 0.0003        | 0.04  | 40   | 0.0003          | 23.8990        | -6.4549          | 1.0                | 30.3539         | -84.4315       | -332.7477    | -1.3562         | -1.8484       |
-| 0.0002        | 0.05  | 50   | 0.0002          | 24.3722        | -6.8482          | 1.0                | 31.2205         | -88.3652       | -328.0153    | -1.3660         | -1.8558       |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0012
+- Rewards/chosen: 23.3318
+- Rewards/rejected: -6.7489
 - Rewards/accuracies: 1.0
+- Rewards/margins: 30.0806
+- Logps/rejected: -87.3513
+- Logps/chosen: -337.4500
+- Logits/rejected: -1.2781
+- Logits/chosen: -1.6769
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.5937        | 0.01  | 10   | 0.3601          | 0.7833         | -0.3607          | 0.9904             | 1.1441          | -23.4698       | -562.9344    | -1.1371         | -1.4401       |
+| 0.0908        | 0.02  | 20   | 1.1245          | 8.8420         | -2.7352          | 0.9615             | 11.5772         | -47.2141       | -482.3473    | -1.1942         | -1.5504       |
+| 0.0683        | 0.03  | 30   | 0.2541          | 17.6490        | -4.7403          | 0.9904             | 22.3893         | -67.2654       | -394.2778    | -1.2341         | -1.6426       |
+| 0.0009        | 0.04  | 40   | 0.0015          | 22.5664        | -5.9863          | 1.0                | 28.5527         | -79.7251       | -345.1035    | -1.2763         | -1.6781       |
+| 0.0003        | 0.05  | 50   | 0.0012          | 23.3318        | -6.7489          | 1.0                | 30.0806         | -87.3513       | -337.4500    | -1.2781         | -1.6769       |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -19,8 +19,8 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
-    "q_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "q_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM"
 }

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2690627b1ee8d50ca1e9089ca5636f40a6038c42e2782441ecd4e1b1f0f6a681
 size 6832600

 version https://git-lfs.github.com/spec/v1
+oid sha256:f80f99035c681c845263561663e67f34c2d083918f74e36fab1f698934cf052c
 size 6832600

runs/Jan16_06-26-11_c6a9bcfdc196/events.out.tfevents.1705386478.c6a9bcfdc196.594.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c4ca71aaa1cf21c95c01c219082e5654ba85094a7d9b554c8a1c73bb881a789
+size 12559

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dd825ceee038058b8a2f5bece56f2f93a6bcb66ea5a6992b366df62855309b21
 size 4091

 version https://git-lfs.github.com/spec/v1
+oid sha256:63b7be98043f6e4899e95c30b2a63ccbdfb2a8332e1daa417547649d18aa0c8d
 size 4091