Update README.md

be3042b verified 5 months ago

3.94 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: google/mt5-small
	tags:
	- summarization
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: mt5-small
	results: []
	datasets:
	- srvmishra832/multilingual-amazon-reviews-6-languages
	language:
	- en
	- de
	---


	# Amazon_MultiLingual_Review_Summarization_with_google_mT5_small

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an Multi Lingual Amazon Reviews dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.9368
	- Model Preparation Time: 0.0038
	- Rouge1: 16.1955
	- Rouge2: 8.1292
	- Rougel: 15.9218
	- Rougelsum: 15.9516

	## Model description

	[google/mt5-small](https://huggingface.co/google/mt5-small)

	## Intended uses & limitations

	Multilingual Product Review Summarization. Supported Languages: English and German

	## Training and evaluation data

	The original multi-lingual Amazon product reviews dataset available on HuggingFace is defunct.

	So, we use the version available at [Kaggle](https://www.kaggle.com/datasets/mexwell/amazon-reviews-multi).

	The original dataset supports 6 languages: English, German, French, Spanish, Japanese, and Chamorro.

	Each language has 20,000 training samples, 5,000 validation samples, and 5,000 testing samples.

	We upload this dataset to HuggingFace hub at [srvmishra832/multilingual-amazon-reviews-6-languages](https://huggingface.co/datasets/srvmishra832/multilingual-amazon-reviews-6-languages)

	Here, we only select the English and German language reviews for the `pc` and `electronics` product categories.

	We use the review titles as summaries, and to prevent the model from generating very small summaries, we filter out those examples with extremely short review titles.

	Finally, we downsample the resulting dataset so that training is feasible on the Google colab T4 GPU in a reasonable amount of time.

	The final downsampled and concatenated dataset contains 8,000 training samples, 452 validation samples, and 422 test samples.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5.6e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Model Preparation Time \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:----------------------:\|:-------:\|:------:\|:-------:\|:---------:\|
	\| 9.0889 \| 1.0 \| 500 \| 3.4117 \| 0.0038 \| 12.541 \| 5.1023 \| 11.9039 \| 11.8749 \|
	\| 4.3977 \| 2.0 \| 1000 \| 3.1900 \| 0.0038 \| 15.342 \| 6.747 \| 14.9223 \| 14.8598 \|
	\| 3.9595 \| 3.0 \| 1500 \| 3.0817 \| 0.0038 \| 15.3976 \| 6.2063 \| 15.0635 \| 15.069 \|
	\| 3.7525 \| 4.0 \| 2000 \| 3.0560 \| 0.0038 \| 15.7991 \| 6.8536 \| 15.4657 \| 15.5263 \|
	\| 3.6191 \| 5.0 \| 2500 \| 3.0048 \| 0.0038 \| 16.3791 \| 7.3671 \| 16.0817 \| 16.059 \|
	\| 3.5155 \| 6.0 \| 3000 \| 2.9779 \| 0.0038 \| 16.2311 \| 7.5629 \| 15.7492 \| 15.758 \|
	\| 3.4497 \| 7.0 \| 3500 \| 2.9663 \| 0.0038 \| 16.2554 \| 8.1464 \| 15.9499 \| 15.9152 \|
	\| 3.3889 \| 8.0 \| 4000 \| 2.9438 \| 0.0038 \| 16.5764 \| 8.3698 \| 16.3225 \| 16.2848 \|
	\| 3.3656 \| 9.0 \| 4500 \| 2.9365 \| 0.0038 \| 16.1416 \| 8.0266 \| 15.8921 \| 15.8913 \|
	\| 3.3562 \| 10.0 \| 5000 \| 2.9368 \| 0.0038 \| 16.1955 \| 8.1292 \| 15.9218 \| 15.9516 \|


	### Framework versions

	- Transformers 4.50.0
	- Pytorch 2.6.0+cu124
	- Datasets 3.4.1
	- Tokenizers 0.21.1