mansoorhamidzadeh
commited on
Commit
•
baaeb24
1
Parent(s):
1c36fee
Update README.md
Browse files
README.md
CHANGED
@@ -3,5 +3,112 @@ library_name: transformers
|
|
3 |
license: mit
|
4 |
language:
|
5 |
- fa
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
pipeline_tag: token-classification
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
license: mit
|
4 |
language:
|
5 |
- fa
|
6 |
+
tags:
|
7 |
+
- named-entity-recognition
|
8 |
+
- ner
|
9 |
+
- nlp
|
10 |
+
- transformers
|
11 |
+
- persian
|
12 |
+
- farsi
|
13 |
+
- persian_ner
|
14 |
+
- bert
|
15 |
+
metrics:
|
16 |
+
- accuracy
|
17 |
pipeline_tag: token-classification
|
18 |
+
---
|
19 |
+
|
20 |
+
# Named-Entity-Recognition for Persian using Transformers
|
21 |
+
|
22 |
+
## Model Details
|
23 |
+
|
24 |
+
**Model Description:**
|
25 |
+
This Named-Entity-Recognition (NER) model is designed to identify and classify named entities in Persian (Farsi) text into predefined categories such as person names, organizations, locations, dates, and more. The model is built using the Hugging Face Transformers library and fine-tuned on the [PartAI/TookaBERT-Base](https://huggingface.co/PartAI/TookaBERT-Base) model.
|
26 |
+
|
27 |
+
**Intended Use:**
|
28 |
+
The model is intended for use in applications where identifying and classifying entities in Persian text is required. It can be used for information retrieval, content analysis, customer support automation, and more.
|
29 |
+
|
30 |
+
**Model Architecture:**
|
31 |
+
- **Model Type:** Transformers-based NER
|
32 |
+
- **Language:** Persian (fa)
|
33 |
+
- **Base Model:** [PartAI/TookaBERT-Base](https://huggingface.co/PartAI/TookaBERT-Base)
|
34 |
+
|
35 |
+
## Training Data
|
36 |
+
|
37 |
+
**Dataset:**
|
38 |
+
The model was trained on a diverse corpus of Persian text, with a training dataset of 15,000 sentences and a test dataset of 2,000 sentences, to ensure broad applicability and high accuracy.
|
39 |
+
|
40 |
+
**Data Preprocessing:**
|
41 |
+
- Text normalization and cleaning were performed to ensure consistency.
|
42 |
+
- Tokenization was done using the BERT tokenizer.
|
43 |
+
|
44 |
+
## Training Procedure
|
45 |
+
|
46 |
+
**Training Configuration:**
|
47 |
+
- **Number of Epochs:** 4
|
48 |
+
- **Batch Size:** 8
|
49 |
+
- **Learning Rate:** 1e-5
|
50 |
+
- **Optimizer:** AdamW
|
51 |
+
|
52 |
+
**Training and Validation Losses:**
|
53 |
+
- **Epoch 1:**
|
54 |
+
- Loss: 0.0610
|
55 |
+
- Validation Loss: 0.0347
|
56 |
+
- **Epoch 2:**
|
57 |
+
- Loss: 0.1363
|
58 |
+
- Validation Loss: 0.0167
|
59 |
+
- **Epoch 3:**
|
60 |
+
- Loss: 0.0327
|
61 |
+
- Validation Loss: 0.0125
|
62 |
+
- **Epoch 4:**
|
63 |
+
- Loss: 0.0016
|
64 |
+
- Validation Loss: 0.0062
|
65 |
+
|
66 |
+
**Hardware:**
|
67 |
+
- **Training Environment:** NVIDIA P100 GPU
|
68 |
+
- **Training Time:** Approximately 1 hour
|
69 |
+
|
70 |
+
## Model Prediction Tags
|
71 |
+
The model predicts the following tags:
|
72 |
+
- "O"
|
73 |
+
- "I-product"
|
74 |
+
- "I-person"
|
75 |
+
- "I-location"
|
76 |
+
- "I-group"
|
77 |
+
- "I-creative-work"
|
78 |
+
- "I-corporation"
|
79 |
+
- "B-product"
|
80 |
+
- "B-person"
|
81 |
+
- "B-location"
|
82 |
+
- "B-group"
|
83 |
+
- "B-creative-work"
|
84 |
+
- "B-corporation"
|
85 |
+
|
86 |
+
## How To Use
|
87 |
+
|
88 |
+
```python
|
89 |
+
import torch
|
90 |
+
from transformers import pipeline
|
91 |
+
|
92 |
+
# Load the NER pipeline
|
93 |
+
ner_pipeline = pipeline("ner", model="NLPclass/Named-entity-recognition")
|
94 |
+
|
95 |
+
# Example text in Persian
|
96 |
+
text = "باراک اوباما در هاوایی متولد شد."
|
97 |
+
|
98 |
+
# Perform NER
|
99 |
+
entities = ner_pipeline(text)
|
100 |
+
|
101 |
+
# Output the entities
|
102 |
+
print(entities)
|
103 |
+
|
104 |
+
```
|
105 |
+
|
106 |
+
```bibtex
|
107 |
+
@misc{mansoorhamidzadeh,
|
108 |
+
author = {mansoorhamidzadeh},
|
109 |
+
title = {Named-Entity-Recognition for Persian using Transformers},
|
110 |
+
year = {2024},
|
111 |
+
publisher = {Hugging Face},
|
112 |
+
howpublished = {\url{https://huggingface.co/mansoorhamidzadeh/Named-entity-recognition}},
|
113 |
+
}
|
114 |
+
```
|