Update README.md
Browse files
README.md
CHANGED
@@ -8,3 +8,34 @@ language: de
|
|
8 |
widget:
|
9 |
- text: "Namlich das Hanns Mulheim zer wirtshus zu Buchse sol gredt haben von Herren von Bern habind die von Zürich verratten oder wollend sy verratten."
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
widget:
|
9 |
- text: "Namlich das Hanns Mulheim zer wirtshus zu Buchse sol gredt haben von Herren von Bern habind die von Zürich verratten oder wollend sy verratten."
|
10 |
---
|
11 |
+
|
12 |
+
# Turmbücher NER
|
13 |
+
|
14 |
+
A model developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.
|
15 |
+
|
16 |
+
## Performance
|
17 |
+
|
18 |
+
| | PER | ORG | LOC | Micro-Avg |
|
19 |
+
| :---: | :---: | :---: | :---: | :---: |
|
20 |
+
| Precision | 82.46% | 28.81% | 88.51% | 81.21% |
|
21 |
+
| Recall | 88.51% | 44.74% | 83.02% | 83.99% |
|
22 |
+
| F1-Score | 85.38% | 35.05% | 85.67% | 82.57% |
|
23 |
+
|
24 |
+
Note: ORG-tags were too inconsistent in the training data and performed poorly.
|
25 |
+
|
26 |
+
We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).
|
27 |
+
|
28 |
+
## Data Set
|
29 |
+
|
30 |
+
Main data set: [Berner Turmbücher](https://www.polit-forum-bern.ch/turmbuecher/), early volumes from 16th C., Early New High German, 61k tokens training data.
|
31 |
+
|
32 |
+
Secondary data sets:
|
33 |
+
- [SSRQ](https://www.ssrq-sds-fds.ch/home/) - Fribourg, language model + tagging, 59k tokens.
|
34 |
+
- [Chorgerichtsmanuale](https://www.adfontes.uzh.ch/370540/training/deutsche-transkriptionsuebungen/chorgerichtsmanuale-einleitung) (unpublished), language model + tagging, 76k tokens.
|
35 |
+
- [Königsfelden Charters](https://www.koenigsfelden.uzh.ch/), language model, 623k tokens.
|
36 |
+
- Talgerichtsprotokolle (unpublished), language model, 438k tokens.
|
37 |
+
|
38 |
+
## Notice
|
39 |
+
|
40 |
+
This project is still in progress.
|
41 |
+
|