Banglish-to-Bangla Transliteration Model
Model Details
Model Description
This model is designed to transliterate Banglish (Bengali written in Roman script) into Bengali script. It is fine-tuned from the facebook/mbart-large-50-many-to-many-mmt model using the SKNahin/bengali-transliteration-data dataset.
- Developed by: Md. Farhan Masud Shohag
- Model type: Sequence-to-Sequence (Translation)
- Language(s): Banglish → Bengali (bn_BD)
- License: Apache 2.0
- Fine-tuned from: facebook/mbart-large-50-many-to-many-mmt
Model Sources
- Repository: https://huggingface.co./your-username/banglish-to-bangla-mbart
- Dataset: SKNahin/bengali-transliteration-data
- Demo (Optional): [Colab Notebook Link or Web Demo]
Uses
Direct Use
- Transliteration of Banglish text to Bengali script for social media, messaging, and formal communication.
Downstream Use
- Fine-tuning for translation tasks between Bengali and other languages.
- Integration into chatbots or virtual assistants.
Out-of-Scope Use
- General-purpose language translation between unrelated languages.
- Handling code-mixed languages (e.g., Banglish + English combinations).
Bias, Risks, and Limitations
Biases
- The dataset may include informal phrases, potentially reducing performance on formal language.
- Performance may degrade for long or complex sentences.
Limitations
- Model performance may vary for rare phrases or slang.
- Does not support mixed language inputs effectively.
Recommendations
Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required.
How to Use
Example Code
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart")
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart")
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translate("ami tomake valobashi"))
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.