dmariko commited on
Commit
302ca49
1 Parent(s): 9935e82

HuggingFaceTB/SmolLM-360M-Instruct

Browse files
README.md CHANGED
@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
20
 
21
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-360M-Instruct) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
- - Loss: 2.0979
24
 
25
  ## Model description
26
 
@@ -40,24 +40,29 @@ More information needed
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.0002
43
- - train_batch_size: 4
44
- - eval_batch_size: 4
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
- - num_devices: 4
48
  - gradient_accumulation_steps: 4
49
- - total_train_batch_size: 64
50
- - total_eval_batch_size: 16
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: constant
53
  - lr_scheduler_warmup_ratio: 0.03
54
- - num_epochs: 1
55
 
56
  ### Training results
57
 
58
  | Training Loss | Epoch | Step | Validation Loss |
59
  |:-------------:|:------:|:----:|:---------------:|
60
- | 2.1319 | 0.9880 | 41 | 2.0979 |
 
 
 
 
 
61
 
62
 
63
  ### Framework versions
 
20
 
21
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-360M-Instruct) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 2.0409
24
 
25
  ## Model description
26
 
 
40
 
41
  The following hyperparameters were used during training:
42
  - learning_rate: 0.0002
43
+ - train_batch_size: 8
44
+ - eval_batch_size: 8
45
  - seed: 42
46
  - distributed_type: multi-GPU
47
+ - num_devices: 8
48
  - gradient_accumulation_steps: 4
49
+ - total_train_batch_size: 256
50
+ - total_eval_batch_size: 64
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: constant
53
  - lr_scheduler_warmup_ratio: 0.03
54
+ - num_epochs: 6
55
 
56
  ### Training results
57
 
58
  | Training Loss | Epoch | Step | Validation Loss |
59
  |:-------------:|:------:|:----:|:---------------:|
60
+ | 2.3444 | 0.9524 | 10 | 2.3038 |
61
+ | 2.2609 | 2.0 | 21 | 2.1919 |
62
+ | 2.1732 | 2.9524 | 31 | 2.1334 |
63
+ | 2.1255 | 4.0 | 42 | 2.0941 |
64
+ | 2.0831 | 4.9524 | 52 | 2.0632 |
65
+ | 2.0563 | 5.7143 | 60 | 2.0409 |
66
 
67
 
68
  ### Framework versions
adapter_config.json CHANGED
@@ -20,13 +20,13 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "v_proj",
24
- "k_proj",
25
- "o_proj",
26
  "gate_proj",
27
  "up_proj",
 
 
 
28
  "down_proj",
29
- "q_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
 
 
23
  "gate_proj",
24
  "up_proj",
25
+ "o_proj",
26
+ "q_proj",
27
+ "k_proj",
28
  "down_proj",
29
+ "v_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d80fa55a96409d6595e0f792589a79de124eb42f9a34027b26b1c5caf68d437e
3
  size 17426248
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0216683abaa9c920d78c4c5a90aed2f701cdb4ae4b481dc09618211a026e35d
3
  size 17426248
runs/Aug28_12-28-25_algo-2/events.out.tfevents.1724848122.algo-2.67.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:efd83f35209d4359fa660148c448c0d88a28d93d6ce79706482a2789df07ac8e
3
+ size 8594
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a8767dcfa891e294a477c1427a36cacd1062a407291969d81ed9b60519f9db3b
3
  size 5240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d57dd6dc0a640f00c6f19e312c5fd1fd07932eb590221acbbcd4acbb9a2c7ba0
3
  size 5240