Banglish-to-Bangla Transliteration Model

Model Details

Model Description

This model is designed to transliterate Banglish (Bengali written in Roman script) into Bengali script. It is fine-tuned from the facebook/mbart-large-50-many-to-many-mmt model using the SKNahin/bengali-transliteration-data dataset.

  • Developed by: Md. Farhan Masud Shohag
  • Model type: Sequence-to-Sequence (Translation)
  • Language(s): Banglish → Bengali (bn_BD)
  • License: Apache 2.0
  • Fine-tuned from: facebook/mbart-large-50-many-to-many-mmt

Model Sources


Uses

Direct Use

  • Transliteration of Banglish text to Bengali script for social media, messaging, and formal communication.

Downstream Use

  • Fine-tuning for translation tasks between Bengali and other languages.
  • Integration into chatbots or virtual assistants.

Out-of-Scope Use

  • General-purpose language translation between unrelated languages.
  • Handling code-mixed languages (e.g., Banglish + English combinations).

Bias, Risks, and Limitations

Biases

  • The dataset may include informal phrases, potentially reducing performance on formal language.
  • Performance may degrade for long or complex sentences.

Limitations

  • Model performance may vary for rare phrases or slang.
  • Does not support mixed language inputs effectively.

Recommendations

Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required.


How to Use

Example Code

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart")
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart")

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
    outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translate("ami tomake valobashi"))
Downloads last month
3
Safetensors
Model size
611M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.