Kurdish Language Detector Model
This is a fine-tuned version of abdulhade/RoBERTa-large-SizeCorpus_1B
, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.
Model Overview
- Model Type: Text classification (language detection)
- Base Model:
abdulhade/RoBERTa-large-SizeCorpus_1B
- Languages Supported: English, Kurdish
- Training Data: Custom bilingual corpus of English and Kurdish text
- Primary Use Case: Identifying whether input text is in English or Kurdish
Model Performance
The model was evaluated using various metrics and achieved outstanding results:
- Evaluation Loss: 0.0012
- Evaluation Accuracy: 99.99%
- Evaluation F1 Score: 0.9999
- Evaluation Precision: 0.99999
- Evaluation Recall: 0.99983
Training Details
- Training Loss: 0.027
- Training Runtime: 40,500.85 seconds
- Samples per Second (Training): 72.35
- Steps per Second (Training): 4.52
- Epochs: 3
Evaluation Details
- Evaluation Runtime: 4,111.17 seconds
- Samples per Second (Evaluation): 237.58
- Steps per Second (Evaluation): 14.85
Hardware and Environment
- Environment: Accelerated hardware (e.g., GPU)
- Default Inference Device: CPU (specify
device=0
for GPU usage)
Quickstart Guide
Installation
Ensure you have the transformers
library and torch
installed:
pip install transformers torch
from transformers import pipeline
# Load the Kurdish Language Detector
kurdish_detector = pipeline('text-classification',
model='abdulhade/kurdishRoBERTa-language-detector-1B',
tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')
# Perform a prediction
result = kurdish_detector("Insert your text here")
print(result) # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]
# Custom function to map the labels
def map_labels(prediction):
label_mapping = {
'LABEL_0': 'English',
'LABEL_1': 'Kurdish'
}
# Map the label and keep the score as is
return {'label': label_mapping[prediction['label']], 'score': prediction['score']}
# Test the model with new input and map the labels
input_text_1 = "Hello World"
input_text_2 = "Hi dear برام دەنگ و باست"
# Get predictions
predictions_1 = kurdish_detector(input_text_1)
predictions_2 = kurdish_detector(input_text_2)
# Map and print results
mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
print(input_text_1)
print(mapped_predictions_1) # Expected output: [{'label': 'English', 'score': <score>}]
print(input_text_2)
print(mapped_predictions_2) # Expected output: [{'label': 'Kurdish', 'score': <score>}]
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.