Update README.md
Browse files
README.md
CHANGED
@@ -12,3 +12,8 @@ The model was trained on about 70GB of data, consisting mostly of OSCAR and Swed
|
|
12 |
To avoid excessive padding documents shorter than 512 tokens were concatenated into one large sequence of 512 tokens, and larger documents were split into multiple 512 token sequences, following https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py
|
13 |
|
14 |
Training was done for a bit more than 8 epochs with a batch size of 2048, resulting in a little less than 125k training steps.
|
|
|
|
|
|
|
|
|
|
|
|
12 |
To avoid excessive padding documents shorter than 512 tokens were concatenated into one large sequence of 512 tokens, and larger documents were split into multiple 512 token sequences, following https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py
|
13 |
|
14 |
Training was done for a bit more than 8 epochs with a batch size of 2048, resulting in a little less than 125k training steps.
|
15 |
+
|
16 |
+
The model has three sister models trained on the same dataset:
|
17 |
+
- [Megatron-BERT-base-125k](https://huggingface.co/KBLab/megatron-bert-base-swedish-cased-125k)
|
18 |
+
- [Megatron-BERT-base-600k](https://huggingface.co/KBLab/megatron-bert-base-swedish-cased-600k)
|
19 |
+
- [Megatron-BERT-large-110k]()
|