Tevfik istanbullu
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,43 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- ar
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
---
|
8 |
+
### Arabic Named Entity Recognition (NER) Model
|
9 |
+
|
10 |
+
# Overview
|
11 |
+
|
12 |
+
This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as:
|
13 |
+
company names, names, cities, etc.
|
14 |
+
|
15 |
+
The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets.
|
16 |
+
|
17 |
+
# Model Highlights
|
18 |
+
- Language: Arabic
|
19 |
+
- Framework: TensorFlow
|
20 |
+
- Data Format: Text files (txt format) with train, validation, and test splits
|
21 |
+
|
22 |
+
# Entities Recognized:
|
23 |
+
- ORG: Organizations (e.g., company names)
|
24 |
+
- LOC: Locations (e.g., cities, countries)
|
25 |
+
- PERS: Persons (e.g., names, excluding common/popular names)
|
26 |
+
- MISC: Miscellaneous (e.g., other identifiable private information)
|
27 |
+
|
28 |
+
-Intended Use: Arabic text processing, personal data anonymization, data extraction.
|
29 |
+
|
30 |
+
# Dataset and Preprocessing
|
31 |
+
The dataset used in this model is split into three parts:
|
32 |
+
|
33 |
+
- Training Set: For model training.
|
34 |
+
- Validation Set: For tuning model hyperparameters and monitoring overfitting.
|
35 |
+
- Test Set: For evaluating final model performance.
|
36 |
+
Each sample in the dataset contains labeled entities for efficient supervised learning.
|
37 |
+
Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow.
|
38 |
+
|
39 |
+
|
40 |
+
# Model Evaluation
|
41 |
+
The model achieved a Test Accuracy of # 0.9675# on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.
|
42 |
+
|
43 |
+
|