File size: 1,827 Bytes
ee0ec3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## EvaLatin 2020 Models #evalatin20_models

EvaLatin 2020 Models are distributed under the
[CC BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/) licence.
The models are based solely on [EvaLatin 2020](https://github.com/CIRCSE/LT4HALA)
treebanks, and additionally use [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md).

The models require [UDPipe 2](https://ufal.mff.cuni.cz/udpipe/2).

### Download

The latest version 200831 of the EvaLatin 2020 models
can be downloaded from [LINDAT/CLARIN repository](https://hdl.handle.net/11234/1-4803).

The models are also available in the [REST service](https://lindat.mff.cuni.cz/services/udpipe/).

### Acknowledgements #evalatin20_models_acknowledgements

This work was supported by the grant no. GX20-16819X of the Grant Agency of the
Czech Republic, and has been using language resources stored and distributed by
the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of
the Czech Republic (project LM2018101).

The models were trained on [EvaLatin 2020](https://github.com/CIRCSE/LT4HALA) treebanks.

Finally, [multilingual BERT](https://github.com/google-research/bert/blob/master/multilingual.md)
is used to provide contextualized word embeddings.

### Publications

- Milan Straka, Jana Straková (2020): [UDPipe at EvaLatin 2020: Contextualized Embeddings
and Treebank Embeddings](https://arxiv.org/abs/2006.03687). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 2006.03687

### Model Performance

| Model | Dataset           |  UPOS |  Lemma |
|:------|:------------------|------:|-------:|
| latin-evalatin20-200830 | test classical    | 96.73 |  96.39 |
| latin-evalatin20-200830 | test cross-genre  | 90.47 |  86.89 |
| latin-evalatin20-200830 | test cross-time   | 87.58 |  90.59 |