robinq commited on
Commit
d21f40f
1 Parent(s): 30c6cbb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - sv
4
+
5
+ ---
6
+
7
+ # 🤗 BERT Swedish
8
+
9
+ This BERT model was trained using the 🤗 transformers library.
10
+ The size of the model is a regular BERT-base with 110M parameters.
11
+ The model was trained on about 70GB of data, consisting mostly of OSCAR (25GB) and Swedish newspaper text curated by the National Library of Sweden.
12
+ To avoid excessive padding documents shorter than 512 tokens were concatenated into one large sequence of 512 tokens, and larger documents were split into multiple 512 token sequences, following https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_mlm.py
13
+
14
+ Training was done for a bit more than 8 epochs with a batch size of 2048, resulting in a little less than 125k training steps.