pszemraj commited on
Commit
7761c03
·
1 Parent(s): 77ed921

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/long-t5-tglobal-base
4
+ tags:
5
+ - generated_from_trainer
6
+ - synthsumm
7
+ metrics:
8
+ - rouge
9
+ datasets:
10
+ - pszemraj/synthsumm
11
+ language:
12
+ - en
13
+ pipeline_tag: summarization
14
+ ---
15
+
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
+
19
+ # long-t5-tglobal-base-synthsumm_direct
20
+
21
+ Fine-tuned on a synthetic dataset of curated long-context text and `GPT-3.5-turbo-1106` summaries spanning multiple domains, including "random" long-context examples from redpajama, the pile, etc.
22
+
23
+ - Note: this model has **not** been fine-tuned on any other summarization datasets, just the `synthsumm` data
24
+
25
+ Try it out in the [gradio demo](https://huggingface.co/spaces/pszemraj/document-summarization)
26
+
27
+ ## Model description
28
+
29
+ This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the None dataset.
30
+ It achieves the following results on the evaluation set:
31
+ - Loss: 1.4378
32
+ - Rouge1: 48.0918
33
+ - Rouge2: 21.2531
34
+ - Rougel: 34.4307
35
+ - Rougelsum: 43.0271
36
+ - Gen Len: 84.5231
37
+
38
+
39
+ ## Training procedure
40
+
41
+ ### Training hyperparameters
42
+
43
+ The following hyperparameters were used during training:
44
+ - learning_rate: 0.0003
45
+ - train_batch_size: 1
46
+ - eval_batch_size: 1
47
+ - seed: 26605
48
+ - gradient_accumulation_steps: 8
49
+ - total_train_batch_size: 8
50
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
51
+ - lr_scheduler_type: inverse_sqrt
52
+ - lr_scheduler_warmup_ratio: 0.03
53
+ - num_epochs: 2.0
54
+
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
58
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:--------:|
59
+ | 1.9183 | 0.38 | 125 | 1.5762 | 38.7221 | 15.0873 | 28.3123 | 34.9655 | 129.2154 |
60
+ | 1.8815 | 0.77 | 250 | 1.5230 | 44.3531 | 17.9384 | 31.7417 | 39.5563 | 87.3538 |
61
+ | 1.7264 | 1.15 | 375 | 1.4735 | 45.7781 | 20.102 | 33.329 | 41.4737 | 101.9231 |
62
+ | 1.8545 | 1.54 | 500 | 1.4505 | 47.0134 | 20.6159 | 33.6118 | 41.6579 | 88.2308 |
63
+ | 1.7444 | 1.92 | 625 | 1.4378 | 48.0918 | 21.2531 | 34.4307 | 43.0271 | 84.5231 |
64
+
65
+
66
+ ### Framework versions
67
+
68
+ - Transformers 4.36.0.dev0
69
+ - Pytorch 2.1.0
70
+ - Datasets 2.15.0
71
+ - Tokenizers 0.15.0