KBLab
/

bert-base-swedish-cased-new

Inference Endpoints

Model card Files Files and versions Community

robinq commited on Mar 17, 2022

Commit

7728276

•

1 Parent(s): edeb5e1

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -12,3 +12,8 @@ The model was trained on about 70GB of data, consisting mostly of OSCAR and Swed
 To avoid excessive padding documents shorter than 512 tokens were concatenated into one large sequence of 512 tokens, and larger documents were split into multiple 512 token sequences, following https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py
 Training was done for a bit more than 8 epochs with a batch size of 2048, resulting in a little less than 125k training steps.

 To avoid excessive padding documents shorter than 512 tokens were concatenated into one large sequence of 512 tokens, and larger documents were split into multiple 512 token sequences, following https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py
 Training was done for a bit more than 8 epochs with a batch size of 2048, resulting in a little less than 125k training steps.
+The model has three sister models trained on the same dataset:
+- [Megatron-BERT-base-125k](https://huggingface.co/KBLab/megatron-bert-base-swedish-cased-125k)
+- [Megatron-BERT-base-600k](https://huggingface.co/KBLab/megatron-bert-base-swedish-cased-600k)
+- [Megatron-BERT-large-110k]()