File size: 1,695 Bytes
c769b40 dd41067 c769b40 c1fb75b c07a699 89a846e c769b40 6337b65 2cf2398 eb59026 07aaf0e 6337b65 e32961e 82c0d3e bc1b4c8 82c0d3e bc1b4c8 82c0d3e 6337b65 eef9733 6337b65 9dc4d19 6337b65 bc1b4c8 df1521a bc1b4c8 df1521a bc1b4c8 ed15c9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
language: en
datasets:
- wnut_17
license: mit
metrics:
- f1
widget:
- text: "Manchester played Liverpool last night in Liverpool."
example_title: "Metonyms"
- text: "i live in brum - slang for birmingham"
example_title: "Slang / informal text"
---
# Reddit NER for place names
Fine-tuned `bert-base-uncased` for named entity recognition, trained using `wnut_17` with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed.
This model was created with two key goals:
1. Improved NER results on social media
2. Target only place names
## Model code
For the model code please see the following [Model GitHub Repository](https://github.com/cjber/reddit-model).
## Metonymy
In theory this model should be able to detect and ignore metonyms. For example in the sentence:
`Manchester played Liverpool last night in Liverpool.`
Both Manchester and the first Liverpool mention refer to football teams, therefore the model outputs:
```python
[
{
"entity_group": "location",
"score": 0.9975672,
"word": "liverpool",
"start": 42,
"end": 51,
}
]
```
## Use in `transformers`
```python
from transformers import pipeline
generator = pipeline(
task="ner",
model="cjber/reddit-ner-place_names",
tokenizer="cjber/reddit-ner-place_names",
aggregation_strategy="first",
)
out = generator("I like reading books. I live in Reading.")
```
`out` gives:
```python
[
{
"entity_group": "location",
"score": 0.94123614,
"word": "reading",
"start": 32,
"end": 39,
}
]
``` |