Commit
·
33e3c53
1
Parent(s):
36ffb69
Update README.md
Browse files
README.md
CHANGED
@@ -55,7 +55,6 @@ The training dataset was subsequently fed to [GPT2](https://huggingface.co/gpt2)
|
|
55 |
| Batch size per device | 4 |
|
56 |
| Weight decay | 0.01 |
|
57 |
| Warmup ratio | 0.06 |
|
58 |
-
| Gradient accumulation steps | 1 |
|
59 |
|
60 |
After training for 3 epochs, or 465,441 steps, over a period of ~25 hours on two GeForce RTX 4090s, the model achieved a loss of 0.61.
|
61 |
|
|
|
55 |
| Batch size per device | 4 |
|
56 |
| Weight decay | 0.01 |
|
57 |
| Warmup ratio | 0.06 |
|
|
|
58 |
|
59 |
After training for 3 epochs, or 465,441 steps, over a period of ~25 hours on two GeForce RTX 4090s, the model achieved a loss of 0.61.
|
60 |
|