iprada commited on
Commit
b147a00
·
1 Parent(s): 30394e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -8,3 +8,34 @@ language: de
8
  widget:
9
  - text: "Namlich das Hanns Mulheim zer wirtshus zu Buchse sol gredt haben von Herren von Bern habind die von Zürich verratten oder wollend sy verratten."
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  widget:
9
  - text: "Namlich das Hanns Mulheim zer wirtshus zu Buchse sol gredt haben von Herren von Bern habind die von Zürich verratten oder wollend sy verratten."
10
  ---
11
+
12
+ # Turmbücher NER
13
+
14
+ A model developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.
15
+
16
+ ## Performance
17
+
18
+ | | PER | ORG | LOC | Micro-Avg |
19
+ | :---: | :---: | :---: | :---: | :---: |
20
+ | Precision | 82.46% | 28.81% | 88.51% | 81.21% |
21
+ | Recall | 88.51% | 44.74% | 83.02% | 83.99% |
22
+ | F1-Score | 85.38% | 35.05% | 85.67% | 82.57% |
23
+
24
+ Note: ORG-tags were too inconsistent in the training data and performed poorly.
25
+
26
+ We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).
27
+
28
+ ## Data Set
29
+
30
+ Main data set: [Berner Turmbücher](https://www.polit-forum-bern.ch/turmbuecher/), early volumes from 16th C., Early New High German, 61k tokens training data.
31
+
32
+ Secondary data sets:
33
+ - [SSRQ](https://www.ssrq-sds-fds.ch/home/) - Fribourg, language model + tagging, 59k tokens.
34
+ - [Chorgerichtsmanuale](https://www.adfontes.uzh.ch/370540/training/deutsche-transkriptionsuebungen/chorgerichtsmanuale-einleitung) (unpublished), language model + tagging, 76k tokens.
35
+ - [Königsfelden Charters](https://www.koenigsfelden.uzh.ch/), language model, 623k tokens.
36
+ - Talgerichtsprotokolle (unpublished), language model, 438k tokens.
37
+
38
+ ## Notice
39
+
40
+ This project is still in progress.
41
+