NMT_demo / README.md
rooftopcoder's picture
fix readme again
40b4ca4

A newer version of the Gradio SDK is available: 5.23.0

Upgrade
metadata
title: NMT demo
emoji: 👌
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 5.19.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Neural Machine Translation for English-Hindi

This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface.

NMT UI Screenshot

Features

  • Unidirectional translation between English and Hindi
  • User-friendly web interface built with Gradio
  • Example translations included
  • Built on Helsinki-NLP's MarianMT model

Installation

Local Setup with Virtual Environment

  1. Clone the repository:
git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git
cd NLPA_Assignment_2_Group_54
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
  1. Install the required packages:
pip install -r requirements.txt

Usage

  1. Make sure your virtual environment is activated
  2. Run the UI:
python nmt_ui.py
  1. Open your browser and navigate to http://localhost:7860

Supported Language Pairs

  • English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model)

Training the Model

The train.py script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps:

  • Loads the Samanantar dataset (English-Hindi subset).
  • Splits the dataset into training and validation sets.
  • Tokenizes the dataset.
  • Sets up training arguments optimized for GPU.
  • Trains the model using the Hugging Face Trainer class.
  • Saves the trained model to the specified directory.
  • Uploads the trained model to the Hugging Face Hub.

To train the model, run:

python train.py

Testing the Model

The model_test.py script is used to test the trained MarianMT model. The script performs the following steps:

  • Loads the trained model and tokenizer from the Hugging Face Hub.
  • Translates a sample input text from English to Hindi.
  • Prints the translated text.

To test the model, run:

python model_test.py

User Interface

The nmt_ui.py script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script.

To launch the interface, run:

python nmt_ui.py

Model Information

This project uses the MarianMT model from Hugging Face Transformers.

Notes:

  • The model supports English-Hindi translation.
  • Based on the Helsinki-NLP/opus-mt-en-hi model.
  • Optimized for English -> Hindi translation pairs.
  • Includes transliteration support for Romanized Hindi text.

Supported Features:

  • English -> Hindi translation.
  • Romanized Hindi -> Devanagari Hindi transliteration.

Examples of Transliteration:

  • "namaste" → "नमस्ते"
  • "aap kaise ho" → "आप कैसे हो"
  • "mera naam" → "मेरा नाम"

Project Structure

NLPA_Assignment_2_Group_54/
├── nmt_ui.py        # Main application file with Gradio interface
├── requirements.txt  # Python dependencies
└── README.md        # Project documentation

License

MIT

Group Members

  • Shubhra J Gadhwala: 2023aa05750
  • Sandeep Kumar Yadav: 2023ab05047
  • Ravi Krishna Mayura: 2023ab05157
  • Satheesh Kumar G: 2023ab05041