n3wtou
/

mt5-swatf

@@ -4,24 +4,9 @@ tags:
 - generated_from_trainer
 datasets:
 - xlsum
-metrics:
-- rouge
 model-index:
 - name: mt5-swatf
-  results:
-  - task:
-      name: Sequence-to-sequence Language Modeling
-      type: text2text-generation
-    dataset:
-      name: xlsum
-      type: xlsum
-      config: swahili
-      split: validation
-      args: swahili
-    metrics:
-    - name: Rouge1
-      type: rouge
-      value: 9.7053
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -30,13 +15,6 @@ should probably proofread and complete it, then remove this comment. -->
 # mt5-swatf
 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the xlsum dataset.
-It achieves the following results on the evaluation set:
-- Loss: nan
-- Rouge1: 9.7053
-- Rouge2: 1.3021
-- Rougel: 8.4306
-- Rougelsum: 8.4159
-- Gen Len: 683.08
 ## Model description
@@ -55,31 +33,19 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 4e-05
-- train_batch_size: 4
-- eval_batch_size: 3
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 10
-- mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
-|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:-------:|
-| 0.0           | 0.8   | 500  | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 1.6   | 1000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 2.4   | 1500 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 3.2   | 2000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 4.0   | 2500 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 4.8   | 3000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 5.6   | 3500 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 6.4   | 4000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 7.2   | 4500 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 8.0   | 5000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 8.8   | 5500 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
-| 0.0           | 9.6   | 6000 | nan             | 9.7053 | 1.3021 | 8.4306 | 8.4159    | 683.08  |
 ### Framework versions

 - generated_from_trainer
 datasets:
 - xlsum
 model-index:
 - name: mt5-swatf
+  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # mt5-swatf
 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the xlsum dataset.
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 4
 - seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 90
+- num_epochs: 5
 ### Training results
 ### Framework versions