--- license: cc-by-4.0 library_name: span-marker base_model: gwlms/teams-base-dewiki-v1-discriminator tags: - span-marker - token-classification - ner - named-entity-recognition pipeline_tag: token-classification widget: - text: "Jürgen Schmidhuber studierte ab 1983 Informatik und Mathematik an der TU München ." example_title: "Wikipedia" datasets: - gwlms/germeval2014 language: - de model-index: - name: SpanMarker with GWLMS TEAMS on GermEval 2014 NER Dataset by Stefan Schweter (@stefan-it) results: - task: type: token-classification name: Named Entity Recognition dataset: type: gwlms/germeval2014 name: GermEval 2014 split: test revision: f3647c56803ce67c08ee8d15f4611054c377b226 metrics: - type: f1 value: 0.8781 name: F1 metrics: - f1 --- # SpanMarker for GermEval 2014 NER This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that was fine-tuned on the [GermEval 2014 NER Dataset](https://sites.google.com/site/germeval2014ner/home). The GermEval 2014 NER Shared Task builds on a new dataset with German Named Entity annotation with the following properties: The data was sampled from German Wikipedia and News Corpora as a collection of citations. The dataset covers over 31,000 sentences corresponding to over 590,000 tokens. The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as `[ORG FC Kickers [LOC Darmstadt]]`. 12 classes of Named Entites are annotated and must be recognized: four main classes `PER`son, `LOC`ation, `ORG`anisation, and `OTH`er and their subclasses by introducing two fine-grained labels: `-deriv` marks derivations from NEs such as "englisch" (“English”), and `-part` marks compounds including a NE as a subsequence deutschlandweit (“Germany-wide”). # Fine-Tuning We use the same hyper-parameters as used in the ["German's Next Language Model"](https://aclanthology.org/2020.coling-main.598/) paper using the [GWLMS TEAMS](https://huggingface.co./gwlms/teams-base-dewiki-v1-discriminator) model as backbone. Evaluation is performed with SpanMarkers internal evaluation code that uses `seqeval`. We fine-tune 5 models and upload the model with best F1-Score on development set. Results on development set are in brackets: | Model | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. | ----------- | --------------- | --------------- | --------------- | ------------------- | ----------------| --------------- | GWLMS TEAMS | (88.76) / 87.85 | (88.54) / 87.77 | (88.41) / 87.98 | (**88.86**) / 87.81 | (88.83) / 88.50 | (88.68) / 87.98 The best model achieves a final test score of 87.81%. Scripts for [training](trainer.py) and [evaluation](evaluator.py) are also available. # Usage The fine-tuned model can be used like: ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("gwlms/span-marker-teams-germeval14") # Run inference entities = model.predict("Jürgen Schmidhuber studierte ab 1983 Informatik und Mathematik an der TU München .") ```