metadata

library_name: transformers
license: apache-2.0
base_model: google/mt5-small
tags:
  - summarization
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mt5-small
    results: []
datasets:
  - srvmishra832/multilingual-amazon-reviews-6-languages
language:
  - en
  - de

Amazon_MultiLingual_Review_Summarization_with_google_mT5_small

This model is a fine-tuned version of google/mt5-small on an Multi Lingual Amazon Reviews dataset. It achieves the following results on the evaluation set:

Loss: 2.9368
Model Preparation Time: 0.0038
Rouge1: 16.1955
Rouge2: 8.1292
Rougel: 15.9218
Rougelsum: 15.9516

Model description

google/mt5-small

Intended uses & limitations

Multilingual Product Review Summarization. Supported Languages: English and German

Training and evaluation data

The original multi-lingual Amazon product reviews dataset available on HuggingFace is defunct.

So, we use the version available at Kaggle.

The original dataset supports 6 languages: English, German, French, Spanish, Japanese, and Chamorro.

Each language has 20,000 training samples, 5,000 validation samples, and 5,000 testing samples.

We upload this dataset to HuggingFace hub at srvmishra832/multilingual-amazon-reviews-6-languages

Here, we only select the English and German language reviews for the pc and electronics product categories.

We use the review titles as summaries, and to prevent the model from generating very small summaries, we filter out those examples with extremely short review titles.

Finally, we downsample the resulting dataset so that training is feasible on the Google colab T4 GPU in a reasonable amount of time.

The final downsampled and concatenated dataset contains 8,000 training samples, 452 validation samples, and 422 test samples.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5.6e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Rouge1	Rouge2	Rougel	Rougelsum
9.0889	1.0	500	3.4117	0.0038	12.541	5.1023	11.9039	11.8749
4.3977	2.0	1000	3.1900	0.0038	15.342	6.747	14.9223	14.8598
3.9595	3.0	1500	3.0817	0.0038	15.3976	6.2063	15.0635	15.069
3.7525	4.0	2000	3.0560	0.0038	15.7991	6.8536	15.4657	15.5263
3.6191	5.0	2500	3.0048	0.0038	16.3791	7.3671	16.0817	16.059
3.5155	6.0	3000	2.9779	0.0038	16.2311	7.5629	15.7492	15.758
3.4497	7.0	3500	2.9663	0.0038	16.2554	8.1464	15.9499	15.9152
3.3889	8.0	4000	2.9438	0.0038	16.5764	8.3698	16.3225	16.2848
3.3656	9.0	4500	2.9365	0.0038	16.1416	8.0266	15.8921	15.8913
3.3562	10.0	5000	2.9368	0.0038	16.1955	8.1292	15.9218	15.9516

Framework versions

Transformers 4.50.0
Pytorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.1