banglish_to_bangla / README.md
fms-byte's picture
Update README.md
c14b1cf verified
---
library_name: transformers
tags:
- mbart
- translation
- banglish-to-bangla
- bengali
---
# Banglish-to-Bangla Transliteration Model
## Model Details
### Model Description
This model is designed to transliterate **Banglish** (Bengali written in Roman script) into **Bengali script**. It is fine-tuned from the **facebook/mbart-large-50-many-to-many-mmt** model using the **SKNahin/bengali-transliteration-data** dataset.
- **Developed by:** Md. Farhan Masud Shohag
- **Model type:** Sequence-to-Sequence (Translation)
- **Language(s):** Banglish → Bengali (bn_BD)
- **License:** Apache 2.0
- **Fine-tuned from:** facebook/mbart-large-50-many-to-many-mmt
---
## Model Sources
- **Repository:** [https://huggingface.co/your-username/banglish-to-bangla-mbart](https://huggingface.co/your-username/banglish-to-bangla-mbart)
- **Dataset:** [SKNahin/bengali-transliteration-data](https://huggingface.co/datasets/SKNahin/bengali-transliteration-data)
- **Demo (Optional):** [Colab Notebook Link or Web Demo]
---
## Uses
### Direct Use
- Transliteration of **Banglish text** to **Bengali script** for social media, messaging, and formal communication.
### Downstream Use
- Fine-tuning for **translation tasks** between Bengali and other languages.
- Integration into chatbots or virtual assistants.
### Out-of-Scope Use
- General-purpose language translation between unrelated languages.
- Handling **code-mixed languages** (e.g., Banglish + English combinations).
---
## Bias, Risks, and Limitations
### Biases
- The dataset may include **informal phrases**, potentially reducing performance on **formal language**.
- Performance may degrade for **long or complex sentences**.
### Limitations
- Model performance may vary for **rare phrases or slang**.
- Does not support **mixed language** inputs effectively.
### Recommendations
Users should evaluate outputs for their specific use cases, especially in formal contexts. Additional filtering or pre-processing may be required.
---
## How to Use
### Example Code
```python
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("your-username/banglish-to-bangla-mbart")
tokenizer = MBart50TokenizerFast.from_pretrained("your-username/banglish-to-bangla-mbart")
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=64)
outputs = model.generate(inputs.input_ids, max_length=64, num_beams=5, early_stopping=True)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translate("ami tomake valobashi"))