Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Kurdish Language Detector Model

This is a fine-tuned version of abdulhade/RoBERTa-large-SizeCorpus_1B, designed for detecting and classifying Kurdish and English text. Leveraging a custom bilingual corpus, this model is effective in distinguishing between these languages and accurately identifying text segments.

Model Overview

  • Model Type: Text classification (language detection)
  • Base Model: abdulhade/RoBERTa-large-SizeCorpus_1B
  • Languages Supported: English, Kurdish
  • Training Data: Custom bilingual corpus of English and Kurdish text
  • Primary Use Case: Identifying whether input text is in English or Kurdish

Model Performance

The model was evaluated using various metrics and achieved outstanding results:

  • Evaluation Loss: 0.0012
  • Evaluation Accuracy: 99.99%
  • Evaluation F1 Score: 0.9999
  • Evaluation Precision: 0.99999
  • Evaluation Recall: 0.99983

Training Details

  • Training Loss: 0.027
  • Training Runtime: 40,500.85 seconds
  • Samples per Second (Training): 72.35
  • Steps per Second (Training): 4.52
  • Epochs: 3

Evaluation Details

  • Evaluation Runtime: 4,111.17 seconds
  • Samples per Second (Evaluation): 237.58
  • Steps per Second (Evaluation): 14.85

Hardware and Environment

  • Environment: Accelerated hardware (e.g., GPU)
  • Default Inference Device: CPU (specify device=0 for GPU usage)

Quickstart Guide

Installation

Ensure you have the transformers library and torch installed:

pip install transformers torch

from transformers import pipeline

# Load the Kurdish Language Detector
kurdish_detector = pipeline('text-classification', 
                            model='abdulhade/kurdishRoBERTa-language-detector-1B', 
                            tokenizer='abdulhade/kurdishRoBERTa-language-detector-1B')

# Perform a prediction
result = kurdish_detector("Insert your text here")
print(result)  # Outputs: [{'label': 'LABEL_1', 'score': <probability>}]

# Custom function to map the labels
def map_labels(prediction):
    label_mapping = {
        'LABEL_0': 'English',
        'LABEL_1': 'Kurdish'
    }
    # Map the label and keep the score as is
    return {'label': label_mapping[prediction['label']], 'score': prediction['score']}

# Test the model with new input and map the labels
input_text_1 = "Hello World"
input_text_2 = "Hi dear    برام  دەنگ و باست"

# Get predictions
predictions_1 = kurdish_detector(input_text_1)
predictions_2 = kurdish_detector(input_text_2)

# Map and print results
mapped_predictions_1 = [map_labels(pred) for pred in predictions_1]
mapped_predictions_2 = [map_labels(pred) for pred in predictions_2]
print(input_text_1)
print(mapped_predictions_1)  # Expected output: [{'label': 'English', 'score': <score>}]
print(input_text_2)
print(mapped_predictions_2)  # Expected output: [{'label': 'Kurdish', 'score': <score>}]
Downloads last month
0
Safetensors
Model size
83.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.