readme: add some more details :)
Browse files
README.md
CHANGED
@@ -17,4 +17,21 @@ datasets:
|
|
17 |
|
18 |
The Journaux-LM is a language model pretrained on historical French newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
The Journaux-LM is a language model pretrained on historical French newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
|
19 |
|
20 |
+
## Datasets
|
21 |
+
|
22 |
+
Version 1 of the Journaux-LM was pretrained on the following publicly available datasets:
|
23 |
+
|
24 |
+
* [`PleIAs/French-PD-Newspapers`](https://huggingface.co/datasets/PleIAs/French-PD-Newspapers)
|
25 |
+
|
26 |
+
In total, the pretraining corpus has a size of 408GB.
|
27 |
+
|
28 |
+
# Changelog
|
29 |
+
|
30 |
+
* 02.11.2024: Initial version of the model. More details are coming very soon!
|
31 |
+
|
32 |
+
# Acknowledgements
|
33 |
+
|
34 |
+
Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
|
35 |
+
Many Thanks for providing access to the TPUs ❤️
|
36 |
+
|
37 |
+
Made from Bavarian Oberland with ❤️ and 🥨.
|