|
--- |
|
library_name: transformers |
|
tags: |
|
- mbart |
|
- translation |
|
- banglish-to-bangla |
|
- bengali |
|
--- |
|
|
|
# Banglish-to-Bangla Transliteration Model |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is designed to transliterate **Banglish** (Bengali written in Roman script) into **Bengali script**. It is fine-tuned from the **facebook/mbart-large-50-many-to-many-mmt** model using the **SKNahin/bengali-transliteration-data** dataset. |
|
|
|
- **Developed by:** Md. Farhan Masud Shohag |
|
- **Model type:** Sequence-to-Sequence (Translation) |
|
- **Language(s):** Banglish → Bengali (bn_BD) |
|
- **License:** Apache 2.0 |
|
- **Fine-tuned from:** facebook/mbart-large-50-many-to-many-mmt |
|
|
|
--- |
|
|
|
## Model Sources |
|
|
|
- **Repository:** [https://huggingface.co/your-username/banglish-to-bangla-mbart](https://huggingface.co/your-username/banglish-to-bangla-mbart) |
|
- **Dataset:** [SKNahin/bengali-transliteration-data](https://huggingface.co/datasets/SKNahin/bengali-transliteration-data) |
|
- **Demo (Optional):** [Colab Notebook Link or Web Demo] |
|
|
|
--- |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
- Transliteration of **Banglish text** to **Bengali script** for social media, messaging, and formal communication. |
|
|
|
### Downstream Use |
|
|
|
- Fine-tuning for **translation tasks** between Bengali and other languages. |
|
- Integration into chatbots or virtual assistants. |
|
|
|
### Out-of-Scope Use |
|
|
|
- General-purpose language translation between unrelated languages. |
|
- Handling **code-mixed languages** (e.g., Banglish + English combinations). |
|
|
|
--- |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Biases |
|
|
|
- The dataset may include **informal phrases**, potentially reducing performance on **formal language**. |
|
- Performance may degrade for **long or complex sentences**. |
|
|
|
### Limitations |
|
|
|
- Model performance may vary for **rare phrases or slang**. |
|
- Does not support **mixed language** inputs effectively. |
|
|
|
### Recommendations |
|
|
|
Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required. |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
### Example Code |
|
|
|
```python |
|
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast |
|
|
|
model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart") |
|
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart") |
|
|
|
def translate(text): |
|
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64) |
|
outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True) |
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(translate("ami tomake valobashi")) |
|
|