Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.23.0
title: NMT demo
emoji: 👌
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 5.19.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Neural Machine Translation for English-Hindi
This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface.
Features
- Unidirectional translation between English and Hindi
- User-friendly web interface built with Gradio
- Example translations included
- Built on Helsinki-NLP's MarianMT model
Installation
Local Setup with Virtual Environment
- Clone the repository:
git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git
cd NLPA_Assignment_2_Group_54
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
Usage
- Make sure your virtual environment is activated
- Run the UI:
python nmt_ui.py
- Open your browser and navigate to
http://localhost:7860
Supported Language Pairs
- English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model)
Training the Model
The train.py
script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps:
- Loads the Samanantar dataset (English-Hindi subset).
- Splits the dataset into training and validation sets.
- Tokenizes the dataset.
- Sets up training arguments optimized for GPU.
- Trains the model using the Hugging Face
Trainer
class. - Saves the trained model to the specified directory.
- Uploads the trained model to the Hugging Face Hub.
To train the model, run:
python train.py
Testing the Model
The model_test.py
script is used to test the trained MarianMT model. The script performs the following steps:
- Loads the trained model and tokenizer from the Hugging Face Hub.
- Translates a sample input text from English to Hindi.
- Prints the translated text.
To test the model, run:
python model_test.py
User Interface
The nmt_ui.py
script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script.
To launch the interface, run:
python nmt_ui.py
Model Information
This project uses the MarianMT model from Hugging Face Transformers.
Notes:
- The model supports English-Hindi translation.
- Based on the Helsinki-NLP/opus-mt-en-hi model.
- Optimized for English -> Hindi translation pairs.
- Includes transliteration support for Romanized Hindi text.
Supported Features:
- English -> Hindi translation.
- Romanized Hindi -> Devanagari Hindi transliteration.
Examples of Transliteration:
- "namaste" → "नमस्ते"
- "aap kaise ho" → "आप कैसे हो"
- "mera naam" → "मेरा नाम"
Project Structure
NLPA_Assignment_2_Group_54/
├── nmt_ui.py # Main application file with Gradio interface
├── requirements.txt # Python dependencies
└── README.md # Project documentation
License
MIT
Group Members
- Shubhra J Gadhwala: 2023aa05750
- Sandeep Kumar Yadav: 2023ab05047
- Ravi Krishna Mayura: 2023ab05157
- Satheesh Kumar G: 2023ab05041