Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
# SocBERT model
|
2 |
Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
|
3 |
-
The tweets are from
|
4 |
-
The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
|
5 |
-
|
|
|
6 |
We benchmarked SocBERT, on 40 text classification tasks with social media data.
|
7 |
-
The model was pre-trained on 160M sequence blocks for 950K steps of which the maximum sequence length is 128.
|
8 |
|
9 |
The experiment results can be found in our paper:
|
10 |
```
|
|
|
1 |
# SocBERT model
|
2 |
Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
|
3 |
+
The tweets are from Archive and collected from Twitter Streaming API.
|
4 |
+
The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
|
5 |
+
SocBERT-base was pretrained on 819M sequence blocks for 100K steps.
|
6 |
+
SocBERT-final was pretrained on 929M (819M+110M) sequence blocks for 112K (100K+12K) steps.
|
7 |
We benchmarked SocBERT, on 40 text classification tasks with social media data.
|
|
|
8 |
|
9 |
The experiment results can be found in our paper:
|
10 |
```
|