Spaces:
Sleeping
Sleeping
File size: 3,592 Bytes
ee49c4e 40b4ca4 ee49c4e ce4167f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
title: NMT demo
emoji: 👌
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: "5.19.0"
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Neural Machine Translation for English-Hindi
This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface.

## Features
- Unidirectional translation between English and Hindi
- User-friendly web interface built with Gradio
- Example translations included
- Built on Helsinki-NLP's MarianMT model
## Installation
### Local Setup with Virtual Environment
1. Clone the repository:
```bash
git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git
cd NLPA_Assignment_2_Group_54
```
2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
```
3. Install the required packages:
```bash
pip install -r requirements.txt
```
## Usage
1. Make sure your virtual environment is activated
2. Run the UI:
```bash
python nmt_ui.py
```
3. Open your browser and navigate to `http://localhost:7860`
## Supported Language Pairs
- English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model)
## Training the Model
The `train.py` script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps:
- Loads the Samanantar dataset (English-Hindi subset).
- Splits the dataset into training and validation sets.
- Tokenizes the dataset.
- Sets up training arguments optimized for GPU.
- Trains the model using the Hugging Face `Trainer` class.
- Saves the trained model to the specified directory.
- Uploads the trained model to the Hugging Face Hub.
To train the model, run:
```bash
python train.py
```
## Testing the Model
The `model_test.py` script is used to test the trained MarianMT model. The script performs the following steps:
- Loads the trained model and tokenizer from the Hugging Face Hub.
- Translates a sample input text from English to Hindi.
- Prints the translated text.
To test the model, run:
```bash
python model_test.py
```
## User Interface
The `nmt_ui.py` script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script.
To launch the interface, run:
```bash
python nmt_ui.py
```
## Model Information
This project uses the MarianMT model from Hugging Face Transformers.
### Notes:
- The model supports English-Hindi translation.
- Based on the Helsinki-NLP/opus-mt-en-hi model.
- Optimized for English -> Hindi translation pairs.
- Includes transliteration support for Romanized Hindi text.
### Supported Features:
- English -> Hindi translation.
- Romanized Hindi -> Devanagari Hindi transliteration.
### Examples of Transliteration:
- "namaste" → "नमस्ते"
- "aap kaise ho" → "आप कैसे हो"
- "mera naam" → "मेरा नाम"
## Project Structure
```
NLPA_Assignment_2_Group_54/
├── nmt_ui.py # Main application file with Gradio interface
├── requirements.txt # Python dependencies
└── README.md # Project documentation
```
## License
MIT
## Group Members
- Shubhra J Gadhwala: 2023aa05750
- Sandeep Kumar Yadav: 2023ab05047
- Ravi Krishna Mayura: 2023ab05157
- Satheesh Kumar G: 2023ab05041
|