--- license: mit language: - ar metrics: - accuracy datasets: - arbml/CLEANANERCorp pipeline_tag: token-classification --- ### Arabic Named Entity Recognition (NER) Model # Overview This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as: company names, names, cities, etc. The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets. # Model Highlights - Language: Arabic - Framework: TensorFlow - Data Format: Text files (txt format) with train, validation, and test splits # Entities Recognized: - ORG: Organizations (e.g., company names) - LOC: Locations (e.g., cities, countries) - PERS: Persons (e.g., names, excluding common/popular names) - MISC: Miscellaneous (e.g., other identifiable private information) -Intended Use: Arabic text processing, personal data anonymization, data extraction. # Dataset and Preprocessing The dataset used in this model is split into three parts: - Training Set: For model training. - Validation Set: For tuning model hyperparameters and monitoring overfitting. - Test Set: For evaluating final model performance. Each sample in the dataset contains labeled entities for efficient supervised learning. Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow. # Model Evaluation The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.