Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,11 @@ Continued, on-premise, pre-training of [MedRoBERTa.nl](https://huggingface.co/CL
|
|
17 |
|
18 |
# Data statistics
|
19 |
|
|
|
|
|
|
|
|
|
|
|
20 |
* Number of tokens: 1.47B, of which 1B from UMCU EHRs
|
21 |
* Number of documents: 5.8M, of which 3.5M UMCU EHRs
|
22 |
* Average number of tokens per document: 253
|
|
|
17 |
|
18 |
# Data statistics
|
19 |
|
20 |
+
Sources:
|
21 |
+
* Dutch medical guidelines
|
22 |
+
* NtvG papers
|
23 |
+
* PMC abstracts translated using GeminiFlash 1.5
|
24 |
+
|
25 |
* Number of tokens: 1.47B, of which 1B from UMCU EHRs
|
26 |
* Number of documents: 5.8M, of which 3.5M UMCU EHRs
|
27 |
* Average number of tokens per document: 253
|