update model card README.md
Browse files
README.md
CHANGED
@@ -2,8 +2,8 @@
|
|
2 |
license: apache-2.0
|
3 |
tags:
|
4 |
- generated_from_trainer
|
5 |
-
|
6 |
-
-
|
7 |
model-index:
|
8 |
- name: long-t5-base-govreport
|
9 |
results: []
|
@@ -14,18 +14,14 @@ should probably proofread and complete it, then remove this comment. -->
|
|
14 |
|
15 |
# long-t5-base-govreport
|
16 |
|
17 |
-
This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
-
|
20 |
-
-
|
21 |
-
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
-
|
25 |
-
- eval_runtime: 880.2631
|
26 |
-
- eval_samples_per_second: 0.284
|
27 |
-
- eval_steps_per_second: 0.284
|
28 |
-
- step: 0
|
29 |
|
30 |
## Model description
|
31 |
|
@@ -44,16 +40,56 @@ More information needed
|
|
44 |
### Training hyperparameters
|
45 |
|
46 |
The following hyperparameters were used during training:
|
47 |
-
- learning_rate: 0.
|
48 |
-
- train_batch_size:
|
49 |
- eval_batch_size: 1
|
50 |
- seed: 4299
|
51 |
- gradient_accumulation_steps: 128
|
52 |
-
- total_train_batch_size:
|
53 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
54 |
- lr_scheduler_type: cosine
|
55 |
- lr_scheduler_warmup_ratio: 0.05
|
56 |
-
- num_epochs:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
### Framework versions
|
59 |
|
|
|
2 |
license: apache-2.0
|
3 |
tags:
|
4 |
- generated_from_trainer
|
5 |
+
metrics:
|
6 |
+
- rouge
|
7 |
model-index:
|
8 |
- name: long-t5-base-govreport
|
9 |
results: []
|
|
|
14 |
|
15 |
# long-t5-base-govreport
|
16 |
|
17 |
+
This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the None dataset.
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- Gen Len: 787.34
|
20 |
+
- Loss: 1.5448
|
21 |
+
- Rouge1: 57.2303
|
22 |
+
- Rouge2: 24.9705
|
23 |
+
- Rougel: 26.8081
|
24 |
+
- Rougelsum: 54.2747
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Model description
|
27 |
|
|
|
40 |
### Training hyperparameters
|
41 |
|
42 |
The following hyperparameters were used during training:
|
43 |
+
- learning_rate: 0.0002
|
44 |
+
- train_batch_size: 3
|
45 |
- eval_batch_size: 1
|
46 |
- seed: 4299
|
47 |
- gradient_accumulation_steps: 128
|
48 |
+
- total_train_batch_size: 384
|
49 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
50 |
- lr_scheduler_type: cosine
|
51 |
- lr_scheduler_warmup_ratio: 0.05
|
52 |
+
- num_epochs: 25.0
|
53 |
+
|
54 |
+
### Training results
|
55 |
+
|
56 |
+
| Training Loss | Epoch | Step | Gen Len | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
|
57 |
+
|:-------------:|:-----:|:----:|:-------:|:---------------:|:-------:|:-------:|:-------:|:---------:|
|
58 |
+
| 2.1198 | 0.39 | 25 | 805.336 | 1.8720 | 29.4332 | 7.3761 | 17.0816 | 25.065 |
|
59 |
+
| 1.8609 | 0.78 | 50 | 833.404 | 1.7601 | 35.3533 | 10.6624 | 18.643 | 31.6979 |
|
60 |
+
| 1.7805 | 1.17 | 75 | 866.356 | 1.6833 | 36.5786 | 11.1185 | 20.0358 | 33.2116 |
|
61 |
+
| 1.7352 | 1.56 | 100 | 822.348 | 1.6524 | 40.5489 | 13.0695 | 20.1256 | 37.1369 |
|
62 |
+
| 1.7371 | 1.95 | 125 | 765.6 | 1.6294 | 43.8594 | 15.2962 | 20.7807 | 40.3461 |
|
63 |
+
| 1.6428 | 2.34 | 150 | 844.184 | 1.6055 | 44.5054 | 15.731 | 21.2582 | 40.9775 |
|
64 |
+
| 1.6567 | 2.73 | 175 | 857.236 | 1.6031 | 47.3641 | 16.9664 | 21.4998 | 43.994 |
|
65 |
+
| 1.5773 | 3.12 | 200 | 841.86 | 1.5855 | 47.2284 | 17.3099 | 21.6793 | 43.9018 |
|
66 |
+
| 1.5614 | 3.52 | 225 | 832.8 | 1.5883 | 46.4612 | 17.1368 | 21.5931 | 43.1184 |
|
67 |
+
| 1.5328 | 3.91 | 250 | 790.056 | 1.5730 | 46.5685 | 17.5423 | 22.2082 | 43.1811 |
|
68 |
+
| 1.5194 | 4.3 | 275 | 825.868 | 1.5690 | 47.6205 | 18.377 | 22.7639 | 44.3701 |
|
69 |
+
| 1.571 | 4.69 | 300 | 794.032 | 1.5676 | 49.2203 | 19.1109 | 22.8005 | 46.0679 |
|
70 |
+
| 1.4275 | 5.08 | 325 | 833.068 | 1.5656 | 50.6982 | 20.0278 | 23.5585 | 47.5036 |
|
71 |
+
| 1.4912 | 5.47 | 350 | 793.068 | 1.5625 | 50.3371 | 19.8639 | 23.3666 | 47.1898 |
|
72 |
+
| 1.4764 | 5.86 | 375 | 819.86 | 1.5532 | 50.9702 | 20.7532 | 23.8765 | 47.9915 |
|
73 |
+
| 1.3972 | 6.25 | 400 | 770.78 | 1.5564 | 49.279 | 19.4781 | 23.1018 | 46.1942 |
|
74 |
+
| 1.4479 | 6.64 | 425 | 806.244 | 1.5529 | 50.3317 | 20.2888 | 23.4454 | 47.3491 |
|
75 |
+
| 1.4567 | 7.03 | 450 | 787.48 | 1.5590 | 52.2209 | 21.2868 | 23.9284 | 49.1691 |
|
76 |
+
| 1.3933 | 7.42 | 475 | 842.664 | 1.5561 | 51.9578 | 20.5806 | 23.7177 | 48.9121 |
|
77 |
+
| 1.4245 | 7.81 | 500 | 813.772 | 1.5420 | 52.3725 | 21.7787 | 24.5209 | 49.4003 |
|
78 |
+
| 1.3033 | 8.2 | 525 | 824.66 | 1.5499 | 52.7839 | 21.589 | 24.5617 | 49.8609 |
|
79 |
+
| 1.3673 | 8.59 | 550 | 807.348 | 1.5530 | 53.2339 | 22.152 | 24.7587 | 50.2502 |
|
80 |
+
| 1.3634 | 8.98 | 575 | 767.952 | 1.5458 | 53.0293 | 22.3194 | 25.174 | 50.078 |
|
81 |
+
| 1.3095 | 9.37 | 600 | 856.252 | 1.5412 | 53.7658 | 22.5229 | 25.0448 | 50.708 |
|
82 |
+
| 1.3492 | 9.76 | 625 | 826.064 | 1.5389 | 51.8662 | 21.6229 | 24.6819 | 48.8648 |
|
83 |
+
| 1.3007 | 10.16 | 650 | 843.544 | 1.5404 | 53.6692 | 22.154 | 24.6218 | 50.6864 |
|
84 |
+
| 1.2729 | 10.55 | 675 | 808.764 | 1.5428 | 54.6479 | 23.3029 | 25.5647 | 51.6394 |
|
85 |
+
| 1.3758 | 10.94 | 700 | 800.152 | 1.5403 | 54.9418 | 23.3323 | 25.6087 | 51.9256 |
|
86 |
+
| 1.3357 | 11.33 | 725 | 814.496 | 1.5455 | 55.2511 | 23.5606 | 25.8237 | 52.3183 |
|
87 |
+
| 1.2817 | 11.72 | 750 | 811.144 | 1.5412 | 55.2847 | 23.6632 | 25.9341 | 52.3146 |
|
88 |
+
| 1.2771 | 12.11 | 775 | 852.704 | 1.5450 | 55.1956 | 23.5545 | 25.677 | 52.1841 |
|
89 |
+
| 1.2892 | 12.5 | 800 | 805.844 | 1.5369 | 54.9563 | 23.5105 | 25.8876 | 51.9568 |
|
90 |
+
| 1.2757 | 12.89 | 825 | 813.476 | 1.5467 | 56.4728 | 24.6875 | 26.4415 | 53.4939 |
|
91 |
+
| 1.2382 | 13.28 | 850 | 787.34 | 1.5448 | 57.2303 | 24.9705 | 26.8081 | 54.2747 |
|
92 |
+
|
93 |
|
94 |
### Framework versions
|
95 |
|