srvmishra832's picture
Update README.md
be3042b verified
---
library_name: transformers
license: apache-2.0
base_model: google/mt5-small
tags:
- summarization
- generated_from_trainer
metrics:
- rouge
model-index:
- name: mt5-small
results: []
datasets:
- srvmishra832/multilingual-amazon-reviews-6-languages
language:
- en
- de
---
# Amazon_MultiLingual_Review_Summarization_with_google_mT5_small
This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an Multi Lingual Amazon Reviews dataset.
It achieves the following results on the evaluation set:
- Loss: 2.9368
- Model Preparation Time: 0.0038
- Rouge1: 16.1955
- Rouge2: 8.1292
- Rougel: 15.9218
- Rougelsum: 15.9516
## Model description
[google/mt5-small](https://huggingface.co/google/mt5-small)
## Intended uses & limitations
Multilingual Product Review Summarization. Supported Languages: English and German
## Training and evaluation data
The original multi-lingual Amazon product reviews dataset available on HuggingFace is defunct.
So, we use the version available at [Kaggle](https://www.kaggle.com/datasets/mexwell/amazon-reviews-multi).
The original dataset supports 6 languages: English, German, French, Spanish, Japanese, and Chamorro.
Each language has 20,000 training samples, 5,000 validation samples, and 5,000 testing samples.
We upload this dataset to HuggingFace hub at [srvmishra832/multilingual-amazon-reviews-6-languages](https://huggingface.co/datasets/srvmishra832/multilingual-amazon-reviews-6-languages)
Here, we only select the English and German language reviews for the `pc` and `electronics` product categories.
We use the review titles as summaries, and to prevent the model from generating very small summaries, we filter out those examples with extremely short review titles.
Finally, we downsample the resulting dataset so that training is feasible on the Google colab T4 GPU in a reasonable amount of time.
The final downsampled and concatenated dataset contains 8,000 training samples, 452 validation samples, and 422 test samples.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
### Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Rouge1 | Rouge2 | Rougel | Rougelsum |
|:-------------:|:-----:|:----:|:---------------:|:----------------------:|:-------:|:------:|:-------:|:---------:|
| 9.0889 | 1.0 | 500 | 3.4117 | 0.0038 | 12.541 | 5.1023 | 11.9039 | 11.8749 |
| 4.3977 | 2.0 | 1000 | 3.1900 | 0.0038 | 15.342 | 6.747 | 14.9223 | 14.8598 |
| 3.9595 | 3.0 | 1500 | 3.0817 | 0.0038 | 15.3976 | 6.2063 | 15.0635 | 15.069 |
| 3.7525 | 4.0 | 2000 | 3.0560 | 0.0038 | 15.7991 | 6.8536 | 15.4657 | 15.5263 |
| 3.6191 | 5.0 | 2500 | 3.0048 | 0.0038 | 16.3791 | 7.3671 | 16.0817 | 16.059 |
| 3.5155 | 6.0 | 3000 | 2.9779 | 0.0038 | 16.2311 | 7.5629 | 15.7492 | 15.758 |
| 3.4497 | 7.0 | 3500 | 2.9663 | 0.0038 | 16.2554 | 8.1464 | 15.9499 | 15.9152 |
| 3.3889 | 8.0 | 4000 | 2.9438 | 0.0038 | 16.5764 | 8.3698 | 16.3225 | 16.2848 |
| 3.3656 | 9.0 | 4500 | 2.9365 | 0.0038 | 16.1416 | 8.0266 | 15.8921 | 15.8913 |
| 3.3562 | 10.0 | 5000 | 2.9368 | 0.0038 | 16.1955 | 8.1292 | 15.9218 | 15.9516 |
### Framework versions
- Transformers 4.50.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1