File size: 3,005 Bytes
c081ac5
 
 
 
 
 
 
 
 
 
 
 
 
95b57e2
c081ac5
95b57e2
2ce3ef5
95b57e2
c081ac5
 
d1b9d6e
13bf7e6
c081ac5
 
 
 
 
 
 
 
 
 
 
 
 
 
3be8916
c081ac5
 
 
 
 
 
 
 
 
 
 
 
 
 
840a046
 
 
c081ac5
 
 
 
840a046
 
 
c081ac5
840a046
 
 
c081ac5
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
library_name: PyLaia
license: mit
tags:
- PyLaia
- PyTorch
- Handwritten text recognition
metrics:
- CER
- WER
language:
- 'lat'
---

# HOME-Alcar and Himanis handwritten text recognition

This model performs Handwritten Text Recognition in Latin.

## Model description

The model has been trained using the PyLaia library on the [HOME-Alcar](https://zenodo.org/record/5600884) document images.
The model was trained on images resized to a fixed height of 128 pixels, keeping the original aspect ratio.

## Evaluation results

The model achieves the following results:

Himanis:

| set   | CER (%)    | WER (%)   | support   |
| ----- | ---------- | --------- | --------- |
| train | 5.31       | 17.47     |   18503   |
| val   | 10.37      | 27.63     |    2367   |
| test  | 9.87       | 28.27     |    2241   |


HOME-Alcar:

| set   | CER (%)    | WER (%)   | support   |
| ----- | ---------- | --------- | --------- |
| train | 4.74       | 17.29     |   59969   |
| val   | 7.82       | 23.67     |    7905   |
| test  | 8.34       | 24.57     |    6932   |

## How to use

Please refer to the PyLaia library page (https://pypi.org/project/pylaia/) to use this model.

# Cite us!

```bibtex
@inproceedings{10.1007/978-3-031-06555-2_29,
author = {Monroc, Claire Bizon and Miret, Blanche and Bonhomme, Marie-Laurence and Kermorvant, Christopher},
title = {A Comprehensive Study Of Open-Source Libraries For Named Entity Recognition On Handwritten Historical Documents},
year = {2022},
isbn = {978-3-031-06554-5},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-031-06555-2_29},
doi = {10.1007/978-3-031-06555-2_29},
abstract = {In this paper, we propose an evaluation of several state-of-the-art open-source natural language processing (NLP) libraries for named entity recognition (NER) on handwritten historical documents: spaCy, Stanza and Flair. The comparison is carried out on three low-resource multilingual datasets of handwritten historical documents: HOME (a multilingual corpus of medieval charters), Balsac (a corpus of parish records from Quebec), and Esposalles (a corpus of marriage records in Catalan). We study the impact of the document recognition processes (text line detection and handwriting recognition) on the performance of the NER. We show that current off-the-shelf NER libraries yield state-of-the-art results, even on low-resource languages or multilingual documents using multilingual models. We show, in an end-to-end evaluation, that text line detection errors have a greater impact than handwriting recognition errors. Finally, we also report state-of-the-art results on the public Esposalles dataset.},
booktitle = {Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings},
pages = {429–444},
numpages = {16},
keywords = {Text line detection, Named entity recognition, Handwritten historical documents},
location = {La Rochelle, France}
}
```