File size: 3,592 Bytes
ee49c4e
 
 
 
 
 
40b4ca4
ee49c4e
 
 
 
 
ce4167f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: NMT demo
emoji: 👌
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: "5.19.0"
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Neural Machine Translation for English-Hindi

This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface.

![NMT UI Screenshot](assets/nmt_ui_screenshot.png)

## Features

- Unidirectional translation between English and Hindi
- User-friendly web interface built with Gradio
- Example translations included
- Built on Helsinki-NLP's MarianMT model

## Installation

### Local Setup with Virtual Environment

1. Clone the repository:
```bash
git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git
cd NLPA_Assignment_2_Group_54
```

2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate
```

3. Install the required packages:
```bash
pip install -r requirements.txt
```

## Usage

1. Make sure your virtual environment is activated
2. Run the UI:
```bash
python nmt_ui.py
```
3. Open your browser and navigate to `http://localhost:7860`

## Supported Language Pairs

- English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model)

## Training the Model

The `train.py` script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps:
- Loads the Samanantar dataset (English-Hindi subset).
- Splits the dataset into training and validation sets.
- Tokenizes the dataset.
- Sets up training arguments optimized for GPU.
- Trains the model using the Hugging Face `Trainer` class.
- Saves the trained model to the specified directory.
- Uploads the trained model to the Hugging Face Hub.

To train the model, run:
```bash
python train.py
```

## Testing the Model

The `model_test.py` script is used to test the trained MarianMT model. The script performs the following steps:
- Loads the trained model and tokenizer from the Hugging Face Hub.
- Translates a sample input text from English to Hindi.
- Prints the translated text.

To test the model, run:
```bash
python model_test.py
```

## User Interface

The `nmt_ui.py` script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script.

To launch the interface, run:
```bash
python nmt_ui.py
```

## Model Information

This project uses the MarianMT model from Hugging Face Transformers.

### Notes:
- The model supports English-Hindi translation.
- Based on the Helsinki-NLP/opus-mt-en-hi model.
- Optimized for English -> Hindi translation pairs.
- Includes transliteration support for Romanized Hindi text.

### Supported Features:
- English -> Hindi translation.
- Romanized Hindi -> Devanagari Hindi transliteration.

### Examples of Transliteration:
- "namaste" → "नमस्ते"
- "aap kaise ho" → "आप कैसे हो"
- "mera naam" → "मेरा नाम"

## Project Structure

```
NLPA_Assignment_2_Group_54/
├── nmt_ui.py        # Main application file with Gradio interface
├── requirements.txt  # Python dependencies
└── README.md        # Project documentation
```

## License

MIT

## Group Members

- Shubhra J Gadhwala: 2023aa05750
- Sandeep Kumar Yadav: 2023ab05047
- Ravi Krishna Mayura: 2023ab05157
- Satheesh Kumar G: 2023ab05041