UMCU commited on
Commit
ec31852
·
verified ·
1 Parent(s): 79f6d33

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -17,6 +17,11 @@ Continued, on-premise, pre-training of [MedRoBERTa.nl](https://huggingface.co/CL
17
 
18
  # Data statistics
19
 
 
 
 
 
 
20
  * Number of tokens: 1.47B, of which 1B from UMCU EHRs
21
  * Number of documents: 5.8M, of which 3.5M UMCU EHRs
22
  * Average number of tokens per document: 253
 
17
 
18
  # Data statistics
19
 
20
+ Sources:
21
+ * Dutch medical guidelines
22
+ * NtvG papers
23
+ * PMC abstracts translated using GeminiFlash 1.5
24
+
25
  * Number of tokens: 1.47B, of which 1B from UMCU EHRs
26
  * Number of documents: 5.8M, of which 3.5M UMCU EHRs
27
  * Average number of tokens per document: 253