Banglish-to-Bangla Transliteration Model
Model Details
Model Description
This model is designed to transliterate Banglish (Bengali written in Roman script) into Bengali script. It is fine-tuned from the facebook/mbart-large-50-many-to-many-mmt model using the SKNahin/bengali-transliteration-data dataset.
- Developed by: Md. Farhan Masud Shohag
- Model type: Sequence-to-Sequence (Translation)
- Language(s): Banglish → Bengali (bn_BD)
- License: Apache 2.0
- Fine-tuned from: facebook/mbart-large-50-many-to-many-mmt
Model Sources
- Repository: https://huggingface.co/your-username/banglish-to-bangla-mbart
- Dataset: SKNahin/bengali-transliteration-data
- Demo (Optional): [Colab Notebook Link or Web Demo]
Uses
Direct Use
- Transliteration of Banglish text to Bengali script for social media, messaging, and formal communication.
Downstream Use
- Fine-tuning for translation tasks between Bengali and other languages.
- Integration into chatbots or virtual assistants.
Out-of-Scope Use
- General-purpose language translation between unrelated languages.
- Handling code-mixed languages (e.g., Banglish + English combinations).
Bias, Risks, and Limitations
Biases
- The dataset may include informal phrases, potentially reducing performance on formal language.
- Performance may degrade for long or complex sentences.
Limitations
- Model performance may vary for rare phrases or slang.
- Does not support mixed language inputs effectively.
Recommendations
Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required.
How to Use
Example Code
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart")
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart")
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translate("ami tomake valobashi"))
- Downloads last month
- 128
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.