metadata

library_name: span-marker
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
metrics:
  - precision
  - recall
  - f1
widget: []
pipeline_tag: token-classification
language:
  - ar

SpanMarker

This is a SpanMarker model that can be used for Named Entity Recognition.

Model Details

Details are here - https://iahlt.github.io/arabic_ner/

Model Description

Model Type: SpanMarker
Maximum Sequence Length: 512 tokens
Maximum Entity Length: 150 words

Tags

ANG - Any named language (Hebrew, Arabic, English, French, etc.)
DUC - A branded product, objects, vehicles, medicines, foods, etc. (Apple, BMW, Coca-Cola, etc.)
EVE - Any named event (Olympics, World Cup, etc.)
FAC - Any named facility, building, airport, etc. (Eiffel Tower, Ben Gurion Airport, etc.)
GPE - Geo-political entity, nation states, counties, cities, etc.
INFORMAL - Informal language (slang)
LOC - Non-GPE locations, geographical regions, mountain ranges, bodies of water, etc.
ORG - Companies, agencies, institutions, political parties, etc.
PER - People, including fictional.
TIMEX - Time expression, absolute or relative dates or periods.
TTL - Any named title, position, profession, etc. (President, Prime Minister, etc.)
WOA - Any named work of art (books, movies, songs, etc.)
MISC - Miscellaneous entities, that do not belong to the previous categories

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("iahlt/xlm-roberta-base-ar-ner-flat")
entities = model.predict(<text>)
print(entities)

Training Details

Framework Versions

Python: 3.10.12
SpanMarker: 1.5.0
Transformers: 4.35.2
PyTorch: 2.1.0+cu121
Datasets: 2.16.1
Tokenizers: 0.15.1

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}