Svenni551
/

gemma-2b-it-toxic-v2.0

@@ -248,6 +248,58 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
 *   **Output:** Generated English-language text in response to the input, such
     as an answer to a question, or a summary of a document.
 ## Model Card Authors
 [More Information Needed]

 *   **Output:** Generated English-language text in response to the input, such
     as an answer to a question, or a summary of a document.
+#### Training Hyperparameters
+The following hyperparameters were used during training:
+- **learning_rate:** `3e-4`
+- **train_batch_size:** Effectively adjusted by `per_device_train_batch_size=1` and `gradient_accumulation_steps=4`
+- **eval_batch_size:** Implicitly determined by the evaluation setup (not explicitly defined)
+- **seed:** Not explicitly stated, crucial for ensuring reproducibility
+- **optimizer:** `paged_adamw_8bit`, designed for efficient memory utilization
+- **lr_scheduler_type:** Learning rate adjustments indicate adaptive scheduling, though specific type is not mentioned
+- **training_steps:** `500`
+- **mixed_precision_training:** Not explicitly mentioned; any applied strategy would aim at computational efficiency
+#### Training Results
+Below is a summary of the training results at every 25th step, showcasing the training loss, gradient norm, learning rate, and corresponding epoch:
+```plaintext
+| Training Step | Training Loss | Grad Norm | Learning Rate               | Epoch |
+|---------------|---------------|-----------|-----------------------------|-------|
+| 1             | 2.1426        | 1.333079  | 0.0002975951903807615       | 0.04  |
+| 25            | 1.1061        | 0.756779  | 0.0002855711422845691       | 0.22  |
+| 50            | 0.8865        | 0.601220  | 0.00027054108216432863      | 0.44  |
+| 75            | 0.9921        | 0.634705  | 0.00025551102204408817      | 0.67  |
+| 100           | 0.8814        | 0.594633  | 0.00024048096192384768      | 0.89  |
+| 125           | 0.5098        | 0.787081  | 0.0002254509018036072       | 1.11  |
+| 150           | 0.4647        | 0.577686  | 0.00021042084168336673      | 1.33  |
+| 175           | 0.4096        | 0.687792  | 0.00019539078156312624      | 1.55  |
+| 200           | 0.5006        | 0.669076  | 0.00018036072144288578      | 1.77  |
+| 225           | 0.5101        | 0.676769  | 0.00016533066132264526      | 2.0   |
+| 250           | 0.1939        | 0.656288  | 0.00015030060120240478      | 2.22  |
+| 275           | 0.2506        | 0.620012  | 0.00013527054108216431      | 2.44  |
+| 300           | 0.2050        | 0.642024  | 0.00012024048096192384      | 2.66  |
+| 325           | 0.3296        | 0.553642  | 0.00010521042084168336      | 2.88  |
+| 350           | 0.0799        | 0.331929  | 9.018036072144289e-05       | 3.1   |
+| 375           | 0.0951        | 0.682525  | 7.515030060120239e-05       | 3.33  |
+| 400           | 0.0927        | 0.438669  | 6.012024048096192e-05       | 3.55  |
+| 425           | 0.0845        | 0.422025  | 4.5090180360721445e-05      | 3.77  |
+| 450           | 0.2115        | 0.718012  | 3.006012024048096e-05       | 3.99  |
+| 475           | 0.0538        | 0.167244  | 1.503006012024048e-05       | 4.21  |
+| 500           | 0.0438        | 0.184941  | 0.0                         | 4.43  |
+#### Final Training Summary
+| Metric                   | Value                 |
+|--------------------------|-----------------------|
+| Train Runtime            | 2457.436s             |
+| Train Samples per Second | 0.814                 |
+| Train Steps per Second   | 0.203                 |
+| Train Loss               | 0.42669185039401053   |
+| Epoch                    | 4.43                  |
 ## Model Card Authors
 [More Information Needed]