File size: 2,346 Bytes
44610e2
 
 
 
 
 
 
 
 
 
 
 
 
 
c68b66f
 
44610e2
 
 
 
 
 
 
 
0dced0e
 
 
44610e2
 
 
 
 
 
 
 
 
0dced0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44610e2
 
 
 
 
 
 
 
f4dfaa2
a08712f
f4dfaa2
44610e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f4dfaa2
44610e2
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
library_name: span-marker
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
- generated_from_span_marker_trainer
metrics:
- precision
- recall
- f1
widget: []
pipeline_tag: token-classification
language:
- ar
---

# SpanMarker

This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.

## Model Details

Details are here - https://iahlt.github.io/arabic_ner/


### Model Description
- **Model Type:** SpanMarker
<!-- - **Encoder:** [Unknown](https://huggingface.co./unknown) -->
- **Maximum Sequence Length:** 512 tokens
- **Maximum Entity Length:** 150 words
<!-- - **Training Dataset:** [Unknown](https://huggingface.co./datasets/unknown) -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Tags
```
ANG - Any named language (Hebrew, Arabic, English, French, etc.)
DUC - A branded product, objects, vehicles, medicines, foods, etc. (Apple, BMW, Coca-Cola, etc.)
EVE - Any named event (Olympics, World Cup, etc.)
FAC - Any named facility, building, airport, etc. (Eiffel Tower, Ben Gurion Airport, etc.)
GPE - Geo-political entity, nation states, counties, cities, etc.
INFORMAL - Informal language (slang)
LOC - Non-GPE locations, geographical regions, mountain ranges, bodies of water, etc.
ORG - Companies, agencies, institutions, political parties, etc.
PER - People, including fictional.
TIMEX - Time expression, absolute or relative dates or periods.
TTL - Any named title, position, profession, etc. (President, Prime Minister, etc.)
WOA - Any named work of art (books, movies, songs, etc.)
MISC - Miscellaneous entities, that do not belong to the previous categories
```

## Uses

### Direct Use for Inference

```python
from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("iahlt/xlm-roberta-base-ar-ner-flat")
entities = model.predict(<text>)
print(entities)
```

## Training Details

### Framework Versions
- Python: 3.10.12
- SpanMarker: 1.5.0
- Transformers: 4.35.2
- PyTorch: 2.1.0+cu121
- Datasets: 2.16.1
- Tokenizers: 0.15.1

## Citation

### BibTeX

```
@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}
```