mistral7b-ft-lora-sql-v2adapters

Files changed (3) hide show

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7530
 ## Model description
@@ -39,26 +39,33 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 4
 - eval_batch_size: 8
 - seed: 1399
-- gradient_accumulation_steps: 8
 - total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: constant
-- lr_scheduler_warmup_steps: 10
-- training_steps: 100
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.9361        | 0.87  | 5    | 0.7619          |
-| 0.7039        | 1.74  | 10   | 0.6887          |
-| 0.5972        | 2.61  | 15   | 0.6736          |
-| 0.5089        | 3.48  | 20   | 0.6861          |
-| 0.4262        | 4.35  | 25   | 0.7530          |
 ### Framework versions

 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3640
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0003
+- train_batch_size: 8
 - eval_batch_size: 8
 - seed: 1399
+- gradient_accumulation_steps: 4
 - total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 100
+- training_steps: 500
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.7533        | 0.06  | 20   | 0.5169          |
+| 0.4806        | 0.11  | 40   | 0.4338          |
+| 0.4285        | 0.17  | 60   | 0.4055          |
+| 0.403         | 0.23  | 80   | 0.3944          |
+| 0.3969        | 0.28  | 100  | 0.3869          |
+| 0.3898        | 0.34  | 120  | 0.3813          |
+| 0.3836        | 0.4   | 140  | 0.3766          |
+| 0.3786        | 0.45  | 160  | 0.3726          |
+| 0.3708        | 0.51  | 180  | 0.3675          |
+| 0.3681        | 0.56  | 200  | 0.3643          |
+| 0.3622        | 0.62  | 220  | 0.3631          |
+| 0.3626        | 0.68  | 240  | 0.3640          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -10,22 +10,22 @@
   "layers_to_transform": null,
   "loftq_config": {},
   "lora_alpha": 32,
-  "lora_dropout": 0.1,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
-  "r": 8,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
     "gate_proj",
     "down_proj",
-    "up_proj",
     "o_proj",
-    "q_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "layers_to_transform": null,
   "loftq_config": {},
   "lora_alpha": 32,
+  "lora_dropout": 0.05,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
+  "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
+    "up_proj",
     "gate_proj",
     "down_proj",
     "o_proj",
+    "k_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d28e288a1063d7eadcd275b6b1fbe609a924b70e454389981a4ee4c30e0777b1
 size 4920

 version https://git-lfs.github.com/spec/v1
+oid sha256:29803e7ae75bdaed47b0d6e1e4950dc7d0954291a521f7f5bb06077b22400735
 size 4920