Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ widget:
|
|
11 |
A BERT-based language model further pre-trained from the checkpoint of [SciBERT](https://huggingface.co/allenai/scibert_scivocab_uncased).
|
12 |
The dataset gathered is a balance between scientific and general works in agriculture domain and encompassing knowledge from different areas of agriculture research and practical knowledge.
|
13 |
|
14 |
-
The corpus contains 1.
|
15 |
|
16 |
The self-supervised learning approach of MLM was used to train the model.
|
17 |
- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
|
|
|
11 |
A BERT-based language model further pre-trained from the checkpoint of [SciBERT](https://huggingface.co/allenai/scibert_scivocab_uncased).
|
12 |
The dataset gathered is a balance between scientific and general works in agriculture domain and encompassing knowledge from different areas of agriculture research and practical knowledge.
|
13 |
|
14 |
+
The corpus contains 1.2 million paragraphs from National Agricultural Library (NAL) from the US Gov. and 5.3 million paragraphs from books and common literature from the **Agriculture Domain**.
|
15 |
|
16 |
The self-supervised learning approach of MLM was used to train the model.
|
17 |
- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
|