mansoorhamidzadeh commited on
Commit
baaeb24
1 Parent(s): 1c36fee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -1
README.md CHANGED
@@ -3,5 +3,112 @@ library_name: transformers
3
  license: mit
4
  language:
5
  - fa
 
 
 
 
 
 
 
 
 
 
 
6
  pipeline_tag: token-classification
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license: mit
4
  language:
5
  - fa
6
+ tags:
7
+ - named-entity-recognition
8
+ - ner
9
+ - nlp
10
+ - transformers
11
+ - persian
12
+ - farsi
13
+ - persian_ner
14
+ - bert
15
+ metrics:
16
+ - accuracy
17
  pipeline_tag: token-classification
18
+ ---
19
+
20
+ # Named-Entity-Recognition for Persian using Transformers
21
+
22
+ ## Model Details
23
+
24
+ **Model Description:**
25
+ This Named-Entity-Recognition (NER) model is designed to identify and classify named entities in Persian (Farsi) text into predefined categories such as person names, organizations, locations, dates, and more. The model is built using the Hugging Face Transformers library and fine-tuned on the [PartAI/TookaBERT-Base](https://huggingface.co/PartAI/TookaBERT-Base) model.
26
+
27
+ **Intended Use:**
28
+ The model is intended for use in applications where identifying and classifying entities in Persian text is required. It can be used for information retrieval, content analysis, customer support automation, and more.
29
+
30
+ **Model Architecture:**
31
+ - **Model Type:** Transformers-based NER
32
+ - **Language:** Persian (fa)
33
+ - **Base Model:** [PartAI/TookaBERT-Base](https://huggingface.co/PartAI/TookaBERT-Base)
34
+
35
+ ## Training Data
36
+
37
+ **Dataset:**
38
+ The model was trained on a diverse corpus of Persian text, with a training dataset of 15,000 sentences and a test dataset of 2,000 sentences, to ensure broad applicability and high accuracy.
39
+
40
+ **Data Preprocessing:**
41
+ - Text normalization and cleaning were performed to ensure consistency.
42
+ - Tokenization was done using the BERT tokenizer.
43
+
44
+ ## Training Procedure
45
+
46
+ **Training Configuration:**
47
+ - **Number of Epochs:** 4
48
+ - **Batch Size:** 8
49
+ - **Learning Rate:** 1e-5
50
+ - **Optimizer:** AdamW
51
+
52
+ **Training and Validation Losses:**
53
+ - **Epoch 1:**
54
+ - Loss: 0.0610
55
+ - Validation Loss: 0.0347
56
+ - **Epoch 2:**
57
+ - Loss: 0.1363
58
+ - Validation Loss: 0.0167
59
+ - **Epoch 3:**
60
+ - Loss: 0.0327
61
+ - Validation Loss: 0.0125
62
+ - **Epoch 4:**
63
+ - Loss: 0.0016
64
+ - Validation Loss: 0.0062
65
+
66
+ **Hardware:**
67
+ - **Training Environment:** NVIDIA P100 GPU
68
+ - **Training Time:** Approximately 1 hour
69
+
70
+ ## Model Prediction Tags
71
+ The model predicts the following tags:
72
+ - "O"
73
+ - "I-product"
74
+ - "I-person"
75
+ - "I-location"
76
+ - "I-group"
77
+ - "I-creative-work"
78
+ - "I-corporation"
79
+ - "B-product"
80
+ - "B-person"
81
+ - "B-location"
82
+ - "B-group"
83
+ - "B-creative-work"
84
+ - "B-corporation"
85
+
86
+ ## How To Use
87
+
88
+ ```python
89
+ import torch
90
+ from transformers import pipeline
91
+
92
+ # Load the NER pipeline
93
+ ner_pipeline = pipeline("ner", model="NLPclass/Named-entity-recognition")
94
+
95
+ # Example text in Persian
96
+ text = "باراک اوباما در هاوایی متولد شد."
97
+
98
+ # Perform NER
99
+ entities = ner_pipeline(text)
100
+
101
+ # Output the entities
102
+ print(entities)
103
+
104
+ ```
105
+
106
+ ```bibtex
107
+ @misc{mansoorhamidzadeh,
108
+ author = {mansoorhamidzadeh},
109
+ title = {Named-Entity-Recognition for Persian using Transformers},
110
+ year = {2024},
111
+ publisher = {Hugging Face},
112
+ howpublished = {\url{https://huggingface.co/mansoorhamidzadeh/Named-entity-recognition}},
113
+ }
114
+ ```