Stancld
/

longt5-tglobal-large-16384-pubmed-3k_steps

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

Stancld commited on Jun 12, 2022

Commit

d7cb793

•

1 Parent(s): ec07e8b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ The fine-tuned model achieves the following results on the evaluation set using
 The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
-The only specific adjustment, I made for the training, was dropping very short sequences (less than 16 tokens) as this sequences do not contribute to gradient creation in the *transient-global* attention, which resulted in training crashes when DDP used.
 ## Usage

 The full training hyper-parameters and logs can be found via the following [W&B run](https://wandb.ai/stancld/LongT5/runs/1lwncl8a?workspace=user-stancld). The model was trained using the [HuggingFace's trainer](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_seq2seq.py).
+The only specific adjustment, I made for the training, was dropping very short input articles (less than 16 words (a bit of mistake, should be less than 16 tokens)) as this sequences do not contribute to gradient creation in the *transient-global* attention, which resulted in training crashes when DDP used.
 ## Usage