Svenni551 commited on
Commit
1f8f4cc
1 Parent(s): ef3932d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -248,6 +248,58 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
248
  * **Output:** Generated English-language text in response to the input, such
249
  as an answer to a question, or a summary of a document.
250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  ## Model Card Authors
252
  [More Information Needed]
253
 
 
248
  * **Output:** Generated English-language text in response to the input, such
249
  as an answer to a question, or a summary of a document.
250
 
251
+ #### Training Hyperparameters
252
+
253
+ The following hyperparameters were used during training:
254
+
255
+ - **learning_rate:** `3e-4`
256
+ - **train_batch_size:** Effectively adjusted by `per_device_train_batch_size=1` and `gradient_accumulation_steps=4`
257
+ - **eval_batch_size:** Implicitly determined by the evaluation setup (not explicitly defined)
258
+ - **seed:** Not explicitly stated, crucial for ensuring reproducibility
259
+ - **optimizer:** `paged_adamw_8bit`, designed for efficient memory utilization
260
+ - **lr_scheduler_type:** Learning rate adjustments indicate adaptive scheduling, though specific type is not mentioned
261
+ - **training_steps:** `500`
262
+ - **mixed_precision_training:** Not explicitly mentioned; any applied strategy would aim at computational efficiency
263
+
264
+ #### Training Results
265
+
266
+ Below is a summary of the training results at every 25th step, showcasing the training loss, gradient norm, learning rate, and corresponding epoch:
267
+
268
+ ```plaintext
269
+ | Training Step | Training Loss | Grad Norm | Learning Rate | Epoch |
270
+ |---------------|---------------|-----------|-----------------------------|-------|
271
+ | 1 | 2.1426 | 1.333079 | 0.0002975951903807615 | 0.04 |
272
+ | 25 | 1.1061 | 0.756779 | 0.0002855711422845691 | 0.22 |
273
+ | 50 | 0.8865 | 0.601220 | 0.00027054108216432863 | 0.44 |
274
+ | 75 | 0.9921 | 0.634705 | 0.00025551102204408817 | 0.67 |
275
+ | 100 | 0.8814 | 0.594633 | 0.00024048096192384768 | 0.89 |
276
+ | 125 | 0.5098 | 0.787081 | 0.0002254509018036072 | 1.11 |
277
+ | 150 | 0.4647 | 0.577686 | 0.00021042084168336673 | 1.33 |
278
+ | 175 | 0.4096 | 0.687792 | 0.00019539078156312624 | 1.55 |
279
+ | 200 | 0.5006 | 0.669076 | 0.00018036072144288578 | 1.77 |
280
+ | 225 | 0.5101 | 0.676769 | 0.00016533066132264526 | 2.0 |
281
+ | 250 | 0.1939 | 0.656288 | 0.00015030060120240478 | 2.22 |
282
+ | 275 | 0.2506 | 0.620012 | 0.00013527054108216431 | 2.44 |
283
+ | 300 | 0.2050 | 0.642024 | 0.00012024048096192384 | 2.66 |
284
+ | 325 | 0.3296 | 0.553642 | 0.00010521042084168336 | 2.88 |
285
+ | 350 | 0.0799 | 0.331929 | 9.018036072144289e-05 | 3.1 |
286
+ | 375 | 0.0951 | 0.682525 | 7.515030060120239e-05 | 3.33 |
287
+ | 400 | 0.0927 | 0.438669 | 6.012024048096192e-05 | 3.55 |
288
+ | 425 | 0.0845 | 0.422025 | 4.5090180360721445e-05 | 3.77 |
289
+ | 450 | 0.2115 | 0.718012 | 3.006012024048096e-05 | 3.99 |
290
+ | 475 | 0.0538 | 0.167244 | 1.503006012024048e-05 | 4.21 |
291
+ | 500 | 0.0438 | 0.184941 | 0.0 | 4.43 |
292
+
293
+ #### Final Training Summary
294
+
295
+ | Metric | Value |
296
+ |--------------------------|-----------------------|
297
+ | Train Runtime | 2457.436s |
298
+ | Train Samples per Second | 0.814 |
299
+ | Train Steps per Second | 0.203 |
300
+ | Train Loss | 0.42669185039401053 |
301
+ | Epoch | 4.43 |
302
+
303
  ## Model Card Authors
304
  [More Information Needed]
305