julien-c HF staff commited on
Commit
45a06c5
1 Parent(s): 2d7fa06

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/Geotrend/bert-base-15lang-cased/README.md

Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: multilingual
3
+
4
+ datasets: wikipedia
5
+
6
+ license: apache-2.0
7
+
8
+ widget:
9
+ - text: "Google generated 46 billion [MASK] in revenue."
10
+ - text: "Paris is the capital of [MASK]."
11
+ - text: "Algiers is the largest city in [MASK]."
12
+ - text: "Paris est la [MASK] de la France."
13
+ - text: "Paris est la capitale de la [MASK]."
14
+ - text: "L'élection américaine a eu [MASK] en novembre 2020."
15
+ - text: "تقع سويسرا في [MASK] أوروبا"
16
+ - text: "إسمي محمد وأسكن في [MASK]."
17
+ ---
18
+
19
+ # bert-base-15lang-cased
20
+
21
+ We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
22
+
23
+ Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
24
+
25
+ The measurements below have been computed on a [Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB)](https://cloud.google.com/compute/docs/machine-types\#n1_machine_type):
26
+
27
+ | Model | Num parameters | Size | Memory | Loading time |
28
+ | ------------------------------- | -------------- | -------- | -------- | ------------ |
29
+ | bert-base-multilingual-cased | 178 million | 714 MB | 1400 MB | 4.2 sec |
30
+ | Geotrend/bert-base-15lang-cased | 141 million | 564 MB | 1098 MB | 3.1 sec |
31
+
32
+ Handled languages: en, fr, es, de, zh, ar, ru, vi, el, bg, th, tr, hi, ur and sw.
33
+
34
+ For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
35
+
36
+ ## How to use
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModel
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-15lang-cased")
42
+ model = AutoModel.from_pretrained("Geotrend/bert-base-15lang-cased")
43
+
44
+ ```
45
+
46
+ To generate other smaller versions of multilingual transformers please visit [our Github repo](https://github.com/Geotrend-research/smaller-transformers).
47
+
48
+ ### How to cite
49
+
50
+ ```bibtex
51
+ @inproceedings{smallermbert,
52
+ title={Load What You Need: Smaller Versions of Mutlilingual BERT},
53
+ author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
54
+ booktitle={SustaiNLP / EMNLP},
55
+ year={2020}
56
+ }
57
+ ```
58
+
59
+ ## Contact
60
+
61
+ Please contact [email protected] for any question, feedback or request.