metadata
language: en
datasets:
- wnut_17
license: mit
metrics:
- f1
widget:
- text: My name is Sylvain and I live in Paris
example_title: Parisian
- text: My name is Sarah and I live in London
example_title: Londoner
Reddit NER for place names
Fine-tuned bert-base-uncased
for named entity recognition, trained using wnut_17
with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed.
This model was created with two key goals:
- Improved NER results on social media
- Target only place names
In theory this model should be able to detect and ignore metonyms. For example in the sentence:
Manchester played Liverpool last night in London.
Both Manchester and Liverpool refer to football teams, therefore the model outputs:
[ { "entity_group": "location", "score": 0.99784255027771, "word": "london", "start": 42, "end": 48 } ]
Use in transformers
from transformers import pipeline
generator = pipeline(
task="ner",
model="cjber/reddit-ner-place_names",
tokenizer="cjber/reddit-ner-place_names",
aggregation_strategy="first",
)
out = generator("I live north of liverpool in Waterloo")
Out gives:
[{'entity_group': 'location',
'score': 0.94054973,
'word': 'liverpool',
'start': 16,
'end': 25},
{'entity_group': 'location',
'score': 0.99520856,
'word': 'waterloo',
'start': 29,
'end': 37}]