File size: 1,827 Bytes
ee0ec3d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
## EvaLatin 2020 Models #evalatin20_models
EvaLatin 2020 Models are distributed under the
[CC BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/) licence.
The models are based solely on [EvaLatin 2020](https://github.com/CIRCSE/LT4HALA)
treebanks, and additionally use [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md).
The models require [UDPipe 2](https://ufal.mff.cuni.cz/udpipe/2).
### Download
The latest version 200831 of the EvaLatin 2020 models
can be downloaded from [LINDAT/CLARIN repository](https://hdl.handle.net/11234/1-4803).
The models are also available in the [REST service](https://lindat.mff.cuni.cz/services/udpipe/).
### Acknowledgements #evalatin20_models_acknowledgements
This work was supported by the grant no. GX20-16819X of the Grant Agency of the
Czech Republic, and has been using language resources stored and distributed by
the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of
the Czech Republic (project LM2018101).
The models were trained on [EvaLatin 2020](https://github.com/CIRCSE/LT4HALA) treebanks.
Finally, [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md)
is used to provide contextualized word embeddings.
### Publications
- Milan Straka, Jana Straková (2020): [UDPipe at EvaLatin 2020: Contextualized Embeddings
and Treebank Embeddings](https://arxiv.org/abs/2006.03687). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 2006.03687
### Model Performance
| Model | Dataset | UPOS | Lemma |
|:------|:------------------|------:|-------:|
| latin-evalatin20-200830 | test classical | 96.73 | 96.39 |
| latin-evalatin20-200830 | test cross-genre | 90.47 | 86.89 |
| latin-evalatin20-200830 | test cross-time | 87.58 | 90.59 |
|