File size: 1,638 Bytes
547994d
 
 
 
 
 
7bb1aeb
 
11ecb48
547994d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bb1aeb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: mit
language:
- ar
metrics:
- accuracy
datasets:
- arbml/CLEANANERCorp
pipeline_tag: token-classification
---
### Arabic Named Entity Recognition (NER) Model

# Overview

This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as:
company names, names, cities, etc. 

The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets. 

# Model Highlights
- Language: Arabic
- Framework: TensorFlow
- Data Format: Text files (txt format) with train, validation, and test splits

# Entities Recognized:
- ORG: Organizations (e.g., company names)
- LOC: Locations (e.g., cities, countries)
- PERS: Persons (e.g., names, excluding common/popular names)
- MISC: Miscellaneous (e.g., other identifiable private information)

-Intended Use: Arabic text processing, personal data anonymization, data extraction.

# Dataset and Preprocessing
The dataset used in this model is split into three parts:

- Training Set: For model training.
- Validation Set: For tuning model hyperparameters and monitoring overfitting.
- Test Set: For evaluating final model performance.
Each sample in the dataset contains labeled entities for efficient supervised learning.
Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow.


# Model Evaluation
The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.