Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ inference:
|
|
27 |
|
28 |
## Training data
|
29 |
|
30 |
-
The underlying corpus, [NerKor+CARS-
|
31 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
32 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
33 |
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
|
@@ -95,6 +95,6 @@ Further non-name entities:
|
|
95 |
address = {Marseille, France},
|
96 |
publisher = {European Language Resources Association},
|
97 |
pages = {1907--1916},
|
98 |
-
url = {
|
99 |
}
|
100 |
```
|
|
|
27 |
|
28 |
## Training data
|
29 |
|
30 |
+
The underlying corpus, [NerKor+CARS-OntoNotes++](https://github.com/ppke-nlpg/NYTK-NerKor-Cars-OntoNotesPP), was derived from [NYTK-NerKor](https://github.com/nytud/NYTK-NerKor), a Hungarian gold standard named entity annotated corpus containing about 1 million tokens.
|
31 |
It includes a small addition of 12k tokens of text (individual sentences) concerning motor vehicles (cars, buses, motorcycles) from the news archive of [hvg.hu](hvg.hu).
|
32 |
While the annotation in NYTK-NerKor followed the CoNLL2002 labelling standard with just four NE categories (`PER`, `LOC`, `MISC`, `ORG`), this version of the corpus features over 30 entity types, including all entity types used in the [OntoNotes 5.0] English NER annotation.
|
33 |
The new annotation elaborates on subtypes of the `LOC` and `MISC` entity types, and includes annotation for non-names like times and dates, quantities, languages and nationalities or religious or political groups. The annotation was elaborated with further entity subtypes not present in the Ontonotes 5 annotation (see below).
|
|
|
95 |
address = {Marseille, France},
|
96 |
publisher = {European Language Resources Association},
|
97 |
pages = {1907--1916},
|
98 |
+
url = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.203.pdf}
|
99 |
}
|
100 |
```
|