tsmatz
/

mt5_summarize_japanese

+---
+license: apache-2.0
+tags:
+- summarization
+- generated_from_trainer
+metrics:
+- rouge
+model-index:
+- name: mt5_summarize_japanese
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# mt5_summarize_japanese
+This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.8952
+- Rouge1: 0.4625
+- Rouge2: 0.2866
+- Rougel: 0.3656
+- Rougelsum: 0.3868
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 2
+- eval_batch_size: 1
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 90
+- num_epochs: 10
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
+|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
+| 4.2501        | 0.36  | 100  | 3.3685          | 0.3114 | 0.1654 | 0.2627 | 0.2694    |
+| 3.6436        | 0.72  | 200  | 3.0095          | 0.3023 | 0.1634 | 0.2684 | 0.2764    |
+| 3.3044        | 1.08  | 300  | 2.8025          | 0.3414 | 0.1789 | 0.2912 | 0.2984    |
+| 3.2693        | 1.44  | 400  | 2.6284          | 0.3616 | 0.1935 | 0.2979 | 0.3132    |
+| 3.2025        | 1.8   | 500  | 2.5271          | 0.3790 | 0.2042 | 0.3046 | 0.3192    |
+| 2.9772        | 2.17  | 600  | 2.4203          | 0.4083 | 0.2374 | 0.3422 | 0.3542    |
+| 2.9133        | 2.53  | 700  | 2.3863          | 0.3847 | 0.2096 | 0.3316 | 0.3406    |
+| 2.9383        | 2.89  | 800  | 2.3573          | 0.4016 | 0.2297 | 0.3361 | 0.3500    |
+| 2.7608        | 3.25  | 900  | 2.3223          | 0.3999 | 0.2249 | 0.3461 | 0.3566    |
+| 2.7864        | 3.61  | 1000 | 2.2293          | 0.3932 | 0.2219 | 0.3297 | 0.3445    |
+| 2.7846        | 3.97  | 1100 | 2.2097          | 0.4386 | 0.2617 | 0.3766 | 0.3826    |
+| 2.7495        | 4.33  | 1200 | 2.1879          | 0.4100 | 0.2449 | 0.3481 | 0.3551    |
+| 2.6092        | 4.69  | 1300 | 2.1515          | 0.4398 | 0.2714 | 0.3787 | 0.3842    |
+| 2.5598        | 5.05  | 1400 | 2.1195          | 0.4366 | 0.2545 | 0.3621 | 0.3736    |
+| 2.5283        | 5.41  | 1500 | 2.0637          | 0.4274 | 0.2551 | 0.3649 | 0.3753    |
+| 2.5947        | 5.77  | 1600 | 2.0588          | 0.4454 | 0.2800 | 0.3828 | 0.3921    |
+| 2.5354        | 6.14  | 1700 | 2.0357          | 0.4253 | 0.2582 | 0.3546 | 0.3687    |
+| 2.5203        | 6.5   | 1800 | 2.0263          | 0.4444 | 0.2686 | 0.3648 | 0.3764    |
+| 2.5303        | 6.86  | 1900 | 1.9926          | 0.4455 | 0.2771 | 0.3795 | 0.3948    |
+| 2.4953        | 7.22  | 2000 | 1.9576          | 0.4523 | 0.2873 | 0.3869 | 0.4053    |
+| 2.4271        | 7.58  | 2100 | 1.9384          | 0.4455 | 0.2811 | 0.3713 | 0.3862    |
+| 2.4462        | 7.94  | 2200 | 1.9230          | 0.4530 | 0.2846 | 0.3754 | 0.3947    |
+| 2.3303        | 8.3   | 2300 | 1.9311          | 0.4519 | 0.2814 | 0.3755 | 0.3887    |
+| 2.3916        | 8.66  | 2400 | 1.9213          | 0.4598 | 0.2897 | 0.3688 | 0.3889    |
+| 2.5995        | 9.03  | 2500 | 1.9060          | 0.4526 | 0.2820 | 0.3733 | 0.3946    |
+| 2.3348        | 9.39  | 2600 | 1.9021          | 0.4595 | 0.2856 | 0.3762 | 0.3988    |
+| 2.4035        | 9.74  | 2700 | 1.8952          | 0.4625 | 0.2866 | 0.3656 | 0.3868    |
+### Framework versions
+- Transformers 4.23.1
+- Pytorch 1.12.1+cu102
+- Datasets 2.6.1
+- Tokenizers 0.13.1