Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ The fine-tuned model achieves the following results on the evaluation set using
|
|
25 |
|
26 |
The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
|
27 |
|
28 |
-
The only specific adjustment, I made for the training, was dropping very short
|
29 |
|
30 |
## Usage
|
31 |
|
|
|
25 |
|
26 |
The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
|
27 |
|
28 |
+
The only specific adjustment, I made for the training, was dropping very short input articles (less than 16 words (a bit of mistake, should be less than 16 tokens)) as this sequences do not contribute to gradient creation in the *transient-global* attention, which resulted in training crashes when DDP used.
|
29 |
|
30 |
## Usage
|
31 |
|