srvmishra832's picture
Update README.md
be3042b verified
metadata
library_name: transformers
license: apache-2.0
base_model: google/mt5-small
tags:
  - summarization
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mt5-small
    results: []
datasets:
  - srvmishra832/multilingual-amazon-reviews-6-languages
language:
  - en
  - de

Amazon_MultiLingual_Review_Summarization_with_google_mT5_small

This model is a fine-tuned version of google/mt5-small on an Multi Lingual Amazon Reviews dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9368
  • Model Preparation Time: 0.0038
  • Rouge1: 16.1955
  • Rouge2: 8.1292
  • Rougel: 15.9218
  • Rougelsum: 15.9516

Model description

google/mt5-small

Intended uses & limitations

Multilingual Product Review Summarization. Supported Languages: English and German

Training and evaluation data

The original multi-lingual Amazon product reviews dataset available on HuggingFace is defunct.

So, we use the version available at Kaggle.

The original dataset supports 6 languages: English, German, French, Spanish, Japanese, and Chamorro.

Each language has 20,000 training samples, 5,000 validation samples, and 5,000 testing samples.

We upload this dataset to HuggingFace hub at srvmishra832/multilingual-amazon-reviews-6-languages

Here, we only select the English and German language reviews for the pc and electronics product categories.

We use the review titles as summaries, and to prevent the model from generating very small summaries, we filter out those examples with extremely short review titles.

Finally, we downsample the resulting dataset so that training is feasible on the Google colab T4 GPU in a reasonable amount of time.

The final downsampled and concatenated dataset contains 8,000 training samples, 452 validation samples, and 422 test samples.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time Rouge1 Rouge2 Rougel Rougelsum
9.0889 1.0 500 3.4117 0.0038 12.541 5.1023 11.9039 11.8749
4.3977 2.0 1000 3.1900 0.0038 15.342 6.747 14.9223 14.8598
3.9595 3.0 1500 3.0817 0.0038 15.3976 6.2063 15.0635 15.069
3.7525 4.0 2000 3.0560 0.0038 15.7991 6.8536 15.4657 15.5263
3.6191 5.0 2500 3.0048 0.0038 16.3791 7.3671 16.0817 16.059
3.5155 6.0 3000 2.9779 0.0038 16.2311 7.5629 15.7492 15.758
3.4497 7.0 3500 2.9663 0.0038 16.2554 8.1464 15.9499 15.9152
3.3889 8.0 4000 2.9438 0.0038 16.5764 8.3698 16.3225 16.2848
3.3656 9.0 4500 2.9365 0.0038 16.1416 8.0266 15.8921 15.8913
3.3562 10.0 5000 2.9368 0.0038 16.1955 8.1292 15.9218 15.9516

Framework versions

  • Transformers 4.50.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1