Stancld commited on
Commit
d7cb793
1 Parent(s): ec07e8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ The fine-tuned model achieves the following results on the evaluation set using
25
 
26
  The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
27
 
28
- The only specific adjustment, I made for the training, was dropping very short sequences (less than 16 tokens) as this sequences do not contribute to gradient creation in the *transient-global* attention, which resulted in training crashes when DDP used.
29
 
30
  ## Usage
31
 
 
25
 
26
  The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
27
 
28
+ The only specific adjustment, I made for the training, was dropping very short input articles (less than 16 words (a bit of mistake, should be less than 16 tokens)) as this sequences do not contribute to gradient creation in the *transient-global* attention, which resulted in training crashes when DDP used.
29
 
30
  ## Usage
31