Update README.md
Browse files
README.md
CHANGED
@@ -251,15 +251,16 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
|
|
251 |
#### Training Hyperparameters
|
252 |
|
253 |
The following hyperparameters were used during training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
254 |
|
255 |
-
- **learning_rate:** `3e-4`
|
256 |
-
- **train_batch_size:** Effectively adjusted by `per_device_train_batch_size=1` and `gradient_accumulation_steps=4`
|
257 |
-
- **eval_batch_size:** Implicitly determined by the evaluation setup (not explicitly defined)
|
258 |
-
- **seed:** Not explicitly stated, crucial for ensuring reproducibility
|
259 |
-
- **optimizer:** `paged_adamw_8bit`, designed for efficient memory utilization
|
260 |
-
- **lr_scheduler_type:** Learning rate adjustments indicate adaptive scheduling, though specific type is not mentioned
|
261 |
-
- **training_steps:** `500`
|
262 |
-
- **mixed_precision_training:** Not explicitly mentioned; any applied strategy would aim at computational efficiency
|
263 |
|
264 |
#### Training Results
|
265 |
|
|
|
251 |
#### Training Hyperparameters
|
252 |
|
253 |
The following hyperparameters were used during training:
|
254 |
+
- learning_rate: 3e-4
|
255 |
+
- per_device_train_batch_size: 1
|
256 |
+
- gradient_accumulation_steps: 4
|
257 |
+
- eval_batch_size: Implicitly determined by the evaluation setup
|
258 |
+
- seed: Not explicitly stated
|
259 |
+
- optimizer: paged_adamw_8bit
|
260 |
+
- lr_scheduler_type: Not specified, adaptive adjustments indicated
|
261 |
+
- training_steps: 500
|
262 |
+
- mixed_precision_training: Not explicitly mentioned
|
263 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
264 |
|
265 |
#### Training Results
|
266 |
|