stefan-it commited on
Commit
05934a6
·
verified ·
1 Parent(s): 2810d42

readme: add some more details :)

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -17,4 +17,21 @@ datasets:
17
 
18
  The Journaux-LM is a language model pretrained on historical French newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
19
 
20
- ⚠️ Model is soon to be uploaded, stay tuned! ⚠️
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  The Journaux-LM is a language model pretrained on historical French newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
19
 
20
+ ## Datasets
21
+
22
+ Version 1 of the Journaux-LM was pretrained on the following publicly available datasets:
23
+
24
+ * [`PleIAs/French-PD-Newspapers`](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers)
25
+
26
+ In total, the pretraining corpus has a size of 408GB.
27
+
28
+ # Changelog
29
+
30
+ * 02.11.2024: Initial version of the model. More details are coming very soon!
31
+
32
+ # Acknowledgements
33
+
34
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
35
+ Many Thanks for providing access to the TPUs ❤️
36
+
37
+ Made from Bavarian Oberland with ❤️ and 🥨.