esuriddick
commited on
Commit
·
9ce5f5b
1
Parent(s):
5e35bbc
Update README.md
Browse files
README.md
CHANGED
@@ -18,17 +18,19 @@ This model is a fine-tuned version of [allenai/led-base-16384](https://huggingfa
|
|
18 |
It achieves the following results on the evaluation set:
|
19 |
- Loss: 1.2887
|
20 |
|
|
|
|
|
21 |
## Model description
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
## Training procedure
|
34 |
|
|
|
18 |
It achieves the following results on the evaluation set:
|
19 |
- Loss: 1.2887
|
20 |
|
21 |
+
The amount of processing time and memory required to assess the ROUGE metrics on the validation and test sets were not supported by Kaggle at this moment in time.
|
22 |
+
|
23 |
## Model description
|
24 |
|
25 |
+
As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [*bart-base*](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, *bart-base*'s position embedding matrix was simply copied 16 times.
|
26 |
|
27 |
+
This model is especially interesting for long-range summarization and question answering.
|
28 |
|
29 |
+
## Intended uses & limitations
|
30 |
|
31 |
+
[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).
|
32 |
|
33 |
+
The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.
|
34 |
|
35 |
## Training procedure
|
36 |
|