Update README.md
Browse files
README.md
CHANGED
@@ -25,10 +25,12 @@ inference:
|
|
25 |
|
26 |
- max_seq_length = 448
|
27 |
|
28 |
-
|
|
|
|
|
29 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
30 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
31 |
-
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups.
|
32 |
|
33 |
## Tags derived from the OntoNotes 5.0 annotation
|
34 |
|
|
|
25 |
|
26 |
- max_seq_length = 448
|
27 |
|
28 |
+
## Training data
|
29 |
+
|
30 |
+
The underlying corpus, [NerKor+CARS-ONPP](https://github.com/novakat/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
|
31 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
32 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
33 |
+
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
|
34 |
|
35 |
## Tags derived from the OntoNotes 5.0 annotation
|
36 |
|