HooshvareLab
/

bert-fa-zwnj-base-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

bert-fa-zwnj-base-ner / README.md

m3hrdadfi's picture

Initialize

52bce73 over 3 years ago

|

3.7 kB

	---
	language: fa
	---


	# BertNER

	This model fine-tuned for the Named Entity Recognition (NER) task on a mixed NER dataset collected from [ARMAN](https://github.com/HaniehP/PersianNER), [PEYMA](http://nsurl.org/2019-2/tasks/task-7-named-entity-recognition-ner-for-farsi/), and [WikiANN](https://elisa-ie.github.io/wikiann/) that covered ten types of entities:

	- Date (DAT)
	- Event (EVE)
	- Facility (FAC)
	- Location (LOC)
	- Money (MON)
	- Organization (ORG)
	- Percent (PCT)
	- Person (PER)
	- Product (PRO)
	- Time (TIM)


	## Dataset Information

	\| \| Records \| B-DAT \| B-EVE \| B-FAC \| B-LOC \| B-MON \| B-ORG \| B-PCT \| B-PER \| B-PRO \| B-TIM \| I-DAT \| I-EVE \| I-FAC \| I-LOC \| I-MON \| I-ORG \| I-PCT \| I-PER \| I-PRO \| I-TIM \|
	\|:------\|----------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|--------:\|
	\| Train \| 29133 \| 1423 \| 1487 \| 1400 \| 13919 \| 417 \| 15926 \| 355 \| 12347 \| 1855 \| 150 \| 1947 \| 5018 \| 2421 \| 4118 \| 1059 \| 19579 \| 573 \| 7699 \| 1914 \| 332 \|
	\| Valid \| 5142 \| 267 \| 253 \| 250 \| 2362 \| 100 \| 2651 \| 64 \| 2173 \| 317 \| 19 \| 373 \| 799 \| 387 \| 717 \| 270 \| 3260 \| 101 \| 1382 \| 303 \| 35 \|
	\| Test \| 6049 \| 407 \| 256 \| 248 \| 2886 \| 98 \| 3216 \| 94 \| 2646 \| 318 \| 43 \| 568 \| 888 \| 408 \| 858 \| 263 \| 3967 \| 141 \| 1707 \| 296 \| 78 \|


	## Evaluation

	The following tables summarize the scores obtained by model overall and per each class.

	Overall

	\| Model \| accuracy \| precision \| recall \| f1 \|
	\|:----------:\|:--------:\|:---------:\|:--------:\|:--------:\|
	\| Bert \| 0.995086 \| 0.953454 \| 0.961113 \| 0.957268 \|


	Per entities

	\| \| number \| precision \| recall \| f1 \|
	\|:---: \|:------: \|:---------: \|:--------: \|:--------: \|
	\| DAT \| 407 \| 0.860636 \| 0.864865 \| 0.862745 \|
	\| EVE \| 256 \| 0.969582 \| 0.996094 \| 0.982659 \|
	\| FAC \| 248 \| 0.976190 \| 0.991935 \| 0.984000 \|
	\| LOC \| 2884 \| 0.970232 \| 0.971914 \| 0.971072 \|
	\| MON \| 98 \| 0.905263 \| 0.877551 \| 0.891192 \|
	\| ORG \| 3216 \| 0.939125 \| 0.954602 \| 0.946800 \|
	\| PCT \| 94 \| 1.000000 \| 0.968085 \| 0.983784 \|
	\| PER \| 2645 \| 0.965244 \| 0.965974 \| 0.965608 \|
	\| PRO \| 318 \| 0.981481 \| 1.000000 \| 0.990654 \|
	\| TIM \| 43 \| 0.692308 \| 0.837209 \| 0.757895 \|


	## How To Use
	You use this model with Transformers pipeline for NER.

	### Installing requirements

	```bash
	pip install transformers
	```

	### How to predict using pipeline

	```python
	from transformers import AutoTokenizer
	from transformers import AutoModelForTokenClassification # for pytorch
	from transformers import TFAutoModelForTokenClassification # for tensorflow
	from transformers import pipeline


	model_name_or_path = "HooshvareLab/bert-fa-zwnj-base-ner"
	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
	model = AutoModelForTokenClassification.from_pretrained(model_name_or_path) # Pytorch
	# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path) # Tensorflow

	nlp = pipeline("ner", model=model, tokenizer=tokenizer)
	example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."

	ner_results = nlp(example)
	print(ner_results)
	```


	## Questions?
	Post a Github issue on the [ParsNER Issues](https://github.com/hooshvare/parsner/issues) repo.