sarkerlab
/

SocBERT-final

Inference Endpoints

Model card Files Files and versions Community

yguo262 commited on Mar 21, 2023

Commit

950d211

·

1 Parent(s): 54f1db1

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 # SocBERT model
 Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
-The tweets are from [Archive](https://archive.org/details/twitterstream) and collected from Twitter Streaming API.
-The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
-The model was trained from scratch following the model architecture of RoBERTa-base.
 We benchmarked SocBERT, on 40 text classification tasks with social media data.
-The model was pre-trained on 160M sequence blocks for 950K steps of which the maximum sequence length is 128.
 The experiment results can be found in our paper:
 ```

 # SocBERT model
 Pretrained model on 20GB English tweets and 72GB Reddit comments using a masked language modeling (MLM) objective.
+The tweets are from Archive and collected from Twitter Streaming API.
+The Reddit comments are ramdonly sampled from all subreddits from 2015-2019.
+SocBERT-base was pretrained on 819M sequence blocks for 100K steps.
+SocBERT-final was pretrained on 929M (819M+110M) sequence blocks for 112K (100K+12K) steps.
 We benchmarked SocBERT, on 40 text classification tasks with social media data.
 The experiment results can be found in our paper:
 ```