stefan-it commited on
Commit
d64c4b3
β€’
1 Parent(s): f1e70b8

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - "historic french"
6
+ ---
7
+ # πŸ€— + πŸ“š dbmdz ELECTRA models
8
+
9
+ In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
10
+ Library open sources French Europeana ELECTRA models πŸŽ‰
11
+
12
+ # French Europeana ELECTRA
13
+
14
+ We extracted all French texts using the `language` metadata attribute from the Europeana corpus.
15
+
16
+ The resulting corpus has a size of 63GB and consists of 11,052,528,456 tokens.
17
+
18
+ Based on the metadata information, texts from the 18th - 20th century are mainly included in the
19
+ training corpus.
20
+
21
+ Detailed information about the data and pretraining steps can be found in
22
+ [this repository](https://github.com/stefan-it/europeana-bert).
23
+
24
+ ## Model weights
25
+
26
+ ELECTRA model weights for PyTorch and TensorFlow are available.
27
+
28
+ * French Europeana ELECTRA (discriminator): `dbmdz/electra-base-french-europeana-cased-discriminator` - [model hub page](https://huggingface.co/dbmdz/electra-base-french-europeana-cased-discriminator/tree/main)
29
+ * French Europeana ELECTRA (generator): `dbmdz/electra-base-french-europeana-cased-generator` - [model hub page](https://huggingface.co/dbmdz/electra-base-french-europeana-cased-generator/tree/main)
30
+
31
+ ## Results
32
+
33
+ For results on Historic NER, please refer to [this repository](https://github.com/stefan-it/europeana-bert).
34
+
35
+ ## Usage
36
+
37
+ With Transformers >= 2.3 our French Europeana ELECTRA model can be loaded like:
38
+
39
+ ```python
40
+ from transformers import AutoModel, AutoTokenizer
41
+ tokenizer = AutoTokenizer.from_pretrained("dbmdz/electra-base-french-europeana-cased-discriminator")
42
+ model = AutoModel.from_pretrained("dbmdz/electra-base-french-europeana-cased-discriminator")
43
+ ```
44
+
45
+ # Huggingface model hub
46
+
47
+ All models are available on the [Huggingface model hub](https://huggingface.co/dbmdz).
48
+
49
+ # Contact (Bugs, Feedback, Contribution and more)
50
+
51
+ For questions about our ELECTRA models just open an issue
52
+ [here](https://github.com/dbmdz/berts/issues/new) πŸ€—
53
+
54
+ # Acknowledgments
55
+
56
+ Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).
57
+ Thanks for providing access to the TFRC ❀️
58
+
59
+ Thanks to the generous support from the [Hugging Face](https://huggingface.co/) team,
60
+ it is possible to download our models from their S3 storage πŸ€—