BLEU Score Comparison for English-to-Japanese Translations

Overview

This project demonstrates the calculation and visualization of BLEU scores for English-to-Japanese translations. The BLEU scores evaluate the performance of two different models: an LSTM-based model and a Seq2Seq model, based on their ability to translate input sentences into Japanese.

Models Evaluated

LSTM-based Model:
- A simpler model that predicts translations based on a sequential structure.
- Tends to perform moderately well but lacks sophistication in handling complex language patterns.
Seq2Seq Model:
- A more advanced model designed for sequence-to-sequence tasks.
- Expected to perform better due to its ability to learn complex patterns and context.

Key Features

Calculates BLEU scores using the SacreBLEU library.
Visualizes BLEU scores as a bar chart for easy comparison.
Saves the BLEU scores to a CSV file for further analysis.

Implementation

Steps in the Code:

Dataset Preparation:
- The dataset contains English sentences and their corresponding Japanese translations (used as references).
- Predictions from both LSTM and Seq2Seq models are compared against these references.
BLEU Score Calculation:
- BLEU scores are computed using SacreBLEU to quantify the overlap between the model predictions and the ground truth references.
Visualization:
- BLEU scores are visualized using a bar chart to provide an intuitive comparison of model performance.
Saving Results:
- The BLEU scores for both models are saved to a CSV file named bleu_scores_english_to_japanese.csv.

Files

main.py: The primary Python script containing the code for BLEU score calculation, visualization, and saving results.
bleu_scores.csv: Output file containing the BLEU scores for both models.

Requirements

Dependencies:

Python 3.x
Libraries:
- sacrebleu
- matplotlib
- csv

To install the required dependencies, run:

pip install sacrebleu matplotlib

Usage

Clone this repository and navigate to the project directory.
Run the script:
```
python main.py
```
View the BLEU scores printed in the console and the generated bar chart.
Check the bleu_scores_english.csv file for the saved results.

Results

The BLEU scores for both models are displayed in the console and visualized in the bar chart.

Example output:

BLEU Score Comparison (English-to-Japanese):
LSTM Model BLEU Score: 45.32
Seq2Seq Model BLEU Score: 70.25
BLEU scores have been saved to bleu_scores.csv

Acknowledgments

This project uses the SacreBLEU library for BLEU score calculation and Matplotlib for visualization.