Update README.md
Browse files
README.md
CHANGED
@@ -84,20 +84,13 @@ xformers_attention: null
|
|
84 |
|
85 |
# Tiny-Darkllama3.2-1B-Instruct
|
86 |
|
87 |
-
This model was trained from
|
88 |
|
89 |
-
## Model description
|
90 |
|
91 |
-
More information needed
|
92 |
|
93 |
-
## Intended uses & limitations
|
94 |
-
|
95 |
-
More information needed
|
96 |
|
97 |
## Training and evaluation data
|
98 |
|
99 |
-
More information needed
|
100 |
-
|
101 |
## Training procedure
|
102 |
|
103 |
### Training hyperparameters
|
@@ -113,7 +106,32 @@ The following hyperparameters were used during training:
|
|
113 |
- training_steps: 20
|
114 |
|
115 |
### Training results
|
116 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
|
118 |
|
119 |
### Framework versions
|
|
|
84 |
|
85 |
# Tiny-Darkllama3.2-1B-Instruct
|
86 |
|
87 |
+
This model was trained from unsloth/Llama-3.2-1B on the ChaoticNeutrals/Luminous_Opus, Synthetic-Dark-RP, Synthetic-RP datasets.
|
88 |
|
|
|
89 |
|
|
|
90 |
|
|
|
|
|
|
|
91 |
|
92 |
## Training and evaluation data
|
93 |
|
|
|
|
|
94 |
## Training procedure
|
95 |
|
96 |
### Training hyperparameters
|
|
|
106 |
- training_steps: 20
|
107 |
|
108 |
### Training results
|
109 |
+
[2025-02-11 13:09:27,300] [INFO] [axolotl.train.train:173] [PID:7240] [RANK:0] Starting trainer...
|
110 |
+
[2025-02-11 13:09:27,706] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:203] [PID:7240] [RANK:0] gather_len_batches: [35]
|
111 |
+
[2025-02-11 13:09:27,761] [INFO] [axolotl.callbacks.on_train_begin:39] [PID:7240] [RANK:0] The Axolotl config has been saved to the MLflow artifacts.
|
112 |
+
{'loss': 3.4922, 'grad_norm': 9.877531051635742, 'learning_rate': 2e-05, 'epoch': 0.03}
|
113 |
+
5% 1/20 [00:02<00:37, 1.98s/it][2025-02-11 13:09:31,221] [INFO] [axolotl.callbacks.on_step_end:127] [PID:7240] [RANK:0] cuda memory usage while training: 12.320GB (+8.604GB cache, +0.565GB misc)
|
114 |
+
{'loss': 3.3057, 'grad_norm': 11.661816596984863, 'learning_rate': 4e-05, 'epoch': 0.06}
|
115 |
+
{'loss': 2.4733, 'grad_norm': 8.751928329467773, 'learning_rate': 6e-05, 'epoch': 0.09}
|
116 |
+
{'loss': 2.9842, 'grad_norm': 10.503549575805664, 'learning_rate': 8e-05, 'epoch': 0.11}
|
117 |
+
{'loss': 2.6624, 'grad_norm': 12.645892143249512, 'learning_rate': 0.0001, 'epoch': 0.14}
|
118 |
+
{'loss': 2.7616, 'grad_norm': 10.691230773925781, 'learning_rate': 0.00012, 'epoch': 0.17}
|
119 |
+
{'loss': 2.9891, 'grad_norm': 10.076760292053223, 'learning_rate': 0.00014, 'epoch': 0.2}
|
120 |
+
{'loss': 2.3745, 'grad_norm': 10.034379959106445, 'learning_rate': 0.00016, 'epoch': 0.23}
|
121 |
+
{'loss': 2.4965, 'grad_norm': 9.778562545776367, 'learning_rate': 0.00018, 'epoch': 0.26}
|
122 |
+
{'loss': 2.3811, 'grad_norm': 19.146963119506836, 'learning_rate': 0.0002, 'epoch': 0.29}
|
123 |
+
{'loss': 3.3611, 'grad_norm': 14.556534767150879, 'learning_rate': 0.00018, 'epoch': 0.31}
|
124 |
+
{'loss': 2.9619, 'grad_norm': 16.88424301147461, 'learning_rate': 0.00016, 'epoch': 0.34}
|
125 |
+
{'loss': 2.121, 'grad_norm': 9.94941520690918, 'learning_rate': 0.00014, 'epoch': 0.37}
|
126 |
+
{'loss': 2.1042, 'grad_norm': 23.178285598754883, 'learning_rate': 0.00012, 'epoch': 0.4}
|
127 |
+
{'loss': 2.4722, 'grad_norm': 10.403461456298828, 'learning_rate': 0.0001, 'epoch': 0.43}
|
128 |
+
{'loss': 2.7434, 'grad_norm': 11.339975357055664, 'learning_rate': 8e-05, 'epoch': 0.46}
|
129 |
+
{'loss': 2.2349, 'grad_norm': 202.98793029785156, 'learning_rate': 6e-05, 'epoch': 0.49}
|
130 |
+
{'loss': 2.3479, 'grad_norm': 10.250885009765625, 'learning_rate': 4e-05, 'epoch': 0.51}
|
131 |
+
{'loss': 2.4169, 'grad_norm': 14.021651268005371, 'learning_rate': 2e-05, 'epoch': 0.54}
|
132 |
+
{'loss': 3.4686, 'grad_norm': 10.988056182861328, 'learning_rate': 0.0, 'epoch': 0.57}
|
133 |
+
{'train_runtime': 172.0118, 'train_samples_per_second': 0.116, 'train_steps_per_second': 0.116, 'train_loss': 2.707640600204468, 'epoch': 0.57}
|
134 |
+
100% 20/20 [02:52<00:00, 8.65s/it]
|
135 |
|
136 |
|
137 |
### Framework versions
|