system HF staff commited on
Commit
738abaf
1 Parent(s): 5e29400

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -11,14 +11,18 @@ license: "MIT"
11
 
12
  ## Model description
13
 
14
- BERT-large-uncased model, pretrained on a corpus of messages from Twitter about COVID-19
 
 
 
15
 
16
  ## Intended uses & limitations
17
 
 
 
18
  #### How to use
19
 
20
  ```python
21
- # You can include sample code which will be formatted
22
  from transformers import pipeline
23
  import json
24
 
@@ -36,10 +40,6 @@ print(json.dumps(out, indent=4))
36
  ]
37
  ```
38
 
39
- ## Training data
40
- Describe the data you used to train the model.
41
- If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
42
-
43
  ## Training procedure
44
  This model was trained on 97M unique tweets (1.2B training examples) collected between January 12 and July 5, 2020 containing at least one of the keywords "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2". These tweets were filtered and preprocessed to reach a final sample of 22.5M tweets (containing 40.7M sentences and 633M tokens) which were used for training.
45
 
 
11
 
12
  ## Model description
13
 
14
+ BERT-large-uncased model, pretrained on a corpus of messages from Twitter about COVID-19. This model is identical to [covid-twitter-bert](https://huggingface.co/digitalepidemiologylab/covid-twitter-bert) - but trained on more data, resulting in higher downstream performance.
15
+
16
+ Find more info on our [GitHub page](https://github.com/digitalepidemiologylab/covid-twitter-bert).
17
+
18
 
19
  ## Intended uses & limitations
20
 
21
+ The model can be used in the `fill-mask` task (see below).
22
+
23
  #### How to use
24
 
25
  ```python
 
26
  from transformers import pipeline
27
  import json
28
 
 
40
  ]
41
  ```
42
 
 
 
 
 
43
  ## Training procedure
44
  This model was trained on 97M unique tweets (1.2B training examples) collected between January 12 and July 5, 2020 containing at least one of the keywords "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2". These tweets were filtered and preprocessed to reach a final sample of 22.5M tweets (containing 40.7M sentences and 633M tokens) which were used for training.
45