Spaces:

rooftopcoder
/

NMT_demo

Running

App Files Files Community

NMT_demo / README.md

rooftopcoder

fix readme again

40b4ca4 4 months ago

preview code

raw

history blame contribute delete

3.59 kB

	---
	title: NMT demo
	emoji: 👌
	colorFrom: red
	colorTo: blue
	sdk: gradio
	sdk_version: "5.19.0"
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
	# Neural Machine Translation for English-Hindi

	This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface.

	![NMT UI Screenshot](assets/nmt_ui_screenshot.png)

	## Features

	- Unidirectional translation between English and Hindi
	- User-friendly web interface built with Gradio
	- Example translations included
	- Built on Helsinki-NLP's MarianMT model

	## Installation

	### Local Setup with Virtual Environment

	1. Clone the repository:
	```bash
	git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git
	cd NLPA_Assignment_2_Group_54
	```

	2. Create and activate a virtual environment:
	```bash
	python -m venv venv
	source venv/bin/activate # On Windows, use: venv\Scripts\activate
	```

	3. Install the required packages:
	```bash
	pip install -r requirements.txt
	```

	## Usage

	1. Make sure your virtual environment is activated
	2. Run the UI:
	```bash
	python nmt_ui.py
	```
	3. Open your browser and navigate to `http://localhost:7860`

	## Supported Language Pairs

	- English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model)

	## Training the Model

	The `train.py` script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps:
	- Loads the Samanantar dataset (English-Hindi subset).
	- Splits the dataset into training and validation sets.
	- Tokenizes the dataset.
	- Sets up training arguments optimized for GPU.
	- Trains the model using the Hugging Face `Trainer` class.
	- Saves the trained model to the specified directory.
	- Uploads the trained model to the Hugging Face Hub.

	To train the model, run:
	```bash
	python train.py
	```

	## Testing the Model

	The `model_test.py` script is used to test the trained MarianMT model. The script performs the following steps:
	- Loads the trained model and tokenizer from the Hugging Face Hub.
	- Translates a sample input text from English to Hindi.
	- Prints the translated text.

	To test the model, run:
	```bash
	python model_test.py
	```

	## User Interface

	The `nmt_ui.py` script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script.

	To launch the interface, run:
	```bash
	python nmt_ui.py
	```

	## Model Information

	This project uses the MarianMT model from Hugging Face Transformers.

	### Notes:
	- The model supports English-Hindi translation.
	- Based on the Helsinki-NLP/opus-mt-en-hi model.
	- Optimized for English -> Hindi translation pairs.
	- Includes transliteration support for Romanized Hindi text.

	### Supported Features:
	- English -> Hindi translation.
	- Romanized Hindi -> Devanagari Hindi transliteration.

	### Examples of Transliteration:
	- "namaste" → "नमस्ते"
	- "aap kaise ho" → "आप कैसे हो"
	- "mera naam" → "मेरा नाम"

	## Project Structure

	```
	NLPA_Assignment_2_Group_54/
	├── nmt_ui.py # Main application file with Gradio interface
	├── requirements.txt # Python dependencies
	└── README.md # Project documentation
	```

	## License

	MIT

	## Group Members

	- Shubhra J Gadhwala: 2023aa05750
	- Sandeep Kumar Yadav: 2023ab05047
	- Ravi Krishna Mayura: 2023ab05157
	- Satheesh Kumar G: 2023ab05041