Spaces:
Sleeping
Sleeping
title: NMT demo | |
emoji: 👌 | |
colorFrom: red | |
colorTo: blue | |
sdk: gradio | |
sdk_version: "5.19.0" | |
app_file: app.py | |
pinned: false | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Neural Machine Translation for English-Hindi | |
This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface. | |
 | |
## Features | |
- Unidirectional translation between English and Hindi | |
- User-friendly web interface built with Gradio | |
- Example translations included | |
- Built on Helsinki-NLP's MarianMT model | |
## Installation | |
### Local Setup with Virtual Environment | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git | |
cd NLPA_Assignment_2_Group_54 | |
``` | |
2. Create and activate a virtual environment: | |
```bash | |
python -m venv venv | |
source venv/bin/activate # On Windows, use: venv\Scripts\activate | |
``` | |
3. Install the required packages: | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Usage | |
1. Make sure your virtual environment is activated | |
2. Run the UI: | |
```bash | |
python nmt_ui.py | |
``` | |
3. Open your browser and navigate to `http://localhost:7860` | |
## Supported Language Pairs | |
- English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model) | |
## Training the Model | |
The `train.py` script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps: | |
- Loads the Samanantar dataset (English-Hindi subset). | |
- Splits the dataset into training and validation sets. | |
- Tokenizes the dataset. | |
- Sets up training arguments optimized for GPU. | |
- Trains the model using the Hugging Face `Trainer` class. | |
- Saves the trained model to the specified directory. | |
- Uploads the trained model to the Hugging Face Hub. | |
To train the model, run: | |
```bash | |
python train.py | |
``` | |
## Testing the Model | |
The `model_test.py` script is used to test the trained MarianMT model. The script performs the following steps: | |
- Loads the trained model and tokenizer from the Hugging Face Hub. | |
- Translates a sample input text from English to Hindi. | |
- Prints the translated text. | |
To test the model, run: | |
```bash | |
python model_test.py | |
``` | |
## User Interface | |
The `nmt_ui.py` script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script. | |
To launch the interface, run: | |
```bash | |
python nmt_ui.py | |
``` | |
## Model Information | |
This project uses the MarianMT model from Hugging Face Transformers. | |
### Notes: | |
- The model supports English-Hindi translation. | |
- Based on the Helsinki-NLP/opus-mt-en-hi model. | |
- Optimized for English -> Hindi translation pairs. | |
- Includes transliteration support for Romanized Hindi text. | |
### Supported Features: | |
- English -> Hindi translation. | |
- Romanized Hindi -> Devanagari Hindi transliteration. | |
### Examples of Transliteration: | |
- "namaste" → "नमस्ते" | |
- "aap kaise ho" → "आप कैसे हो" | |
- "mera naam" → "मेरा नाम" | |
## Project Structure | |
``` | |
NLPA_Assignment_2_Group_54/ | |
├── nmt_ui.py # Main application file with Gradio interface | |
├── requirements.txt # Python dependencies | |
└── README.md # Project documentation | |
``` | |
## License | |
MIT | |
## Group Members | |
- Shubhra J Gadhwala: 2023aa05750 | |
- Sandeep Kumar Yadav: 2023ab05047 | |
- Ravi Krishna Mayura: 2023ab05157 | |
- Satheesh Kumar G: 2023ab05041 | |