Update README.md

e45d94b verified 4 months ago

6.99 kB

	---
	library_name: transformers
	license: mit
	base_model: facebook/bart-large-cnn
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: bart-large-cnn-finetuned
	results:
	- task:
	type: summarization
	name: Summarization
	dataset:
	name: billsum
	type: billsum
	config: 3.0.0
	split: train
	metrics:
	- name: ROUGE-1
	type: rouge
	value: 51.9605
	verified: true
	- name: ROUGE-2
	type: rouge
	value: 20.8149
	verified: true
	- name: ROUGE-L
	type: rouge
	value: 36.2784
	verified: true
	- name: ROUGE-LSUM
	type: rouge
	value: 47.1043
	verified: true
	- name: loss
	type: loss
	value: 1.1553
	verified: true
	- name: gen_len
	type: gen_len
	value: 63.9903
	verified: true
	pipeline_tag: summarization
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bart-large-finetuned-billsum

	This model is a fine-tuned version of [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) on a [FiscalNote/Billsum](https://huggingface.co/datasets/FiscalNote/billsum) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.1553
	- Rouge1: 51.9605
	- Rouge2: 36.2784
	- Rougel: 44.1511
	- Rougelsum: 47.1043
	- Gen Len: 63.9903

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1000
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Train Loss \| Step \| Val Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| 1.4735 \| 1000 \| 1.3306 \| 50.6543 \| 33.9684 \| 42.2550 \| 45.4452 \| 63.9983 \|
	\| 1.3146 \| 2000 \| 1.2376 \| 51.0888 \| 34.9554 \| 42.9847 \| 45.8933 \| 63.9903 \|
	\| 1.1542 \| 3000 \| 1.1874 \| 51.5755 \| 35.6875 \| 43.6806 \| 46.5762 \| 63.9800 \|
	\| 1.0917 \| 4000 \| 1.1714 \| 51.8612 \| 36.1809 \| 44.0608 \| 47.0279 \| 63.9870 \|
	\| 1.0380 \| 5000 \| 1.1553 \| 51.9605 \| 36.2784 \| 44.1511 \| 47.1043 \| 63.9903 \|

	```python
	from transformers import pipeline

	summarizer = pipeline("summarization", model="luluw/bart-large-cnn-finetuned")

	text = """
	The paper "Attention is All You Need" revolutionized the field of natural language processing (NLP) by introducing the Transformer architecture, which relies solely on attention mechanisms to model long-range dependencies in sequential data. Prior to this, models like recurrent neural networks (RNNs) and convolutional neural networks (CNNs) were the primary tools for sequence modeling, but they suffered from limitations such as difficulty in parallelization and the vanishing gradient problem. The Transformer, however, breaks free from these constraints by using a self-attention mechanism, which allows it to attend to different parts of a sequence simultaneously, leading to more efficient training and better performance on tasks such as machine translation, text summarization, and language modeling.
	The core innovation of the Transformer model lies in its multi-head self-attention mechanism. Unlike RNNs that process sequences step-by-step, the Transformer processes the entire sequence at once by applying self-attention to every word or token. This allows each token to weigh the relevance of other tokens in the sequence, giving the model a global understanding of context. Multi-head attention refers to applying multiple attention layers in parallel, enabling the model to focus on different parts of the input sequence simultaneously. This enhances the model's ability to capture various relationships and nuances in the data.
	The Transformer consists of an encoder-decoder structure. The encoder takes in the input sequence, computes self-attention to understand relationships between tokens, and generates a context-aware representation. The decoder, which also incorporates self-attention, generates the output sequence one token at a time by attending to both the previously generated tokens and the encoder's output. This architecture, coupled with position-wise feed-forward networks and layer normalization, makes the Transformer highly scalable and efficient.
	Another significant contribution of the paper is the introduction of positional encoding. Since the Transformer lacks the inherent sequential nature of RNNs, it cannot infer the order of tokens from the architecture itself. To overcome this, the authors introduced positional encodings, which are added to the input embeddings to provide the model with information about the relative position of tokens. These encodings allow the model to maintain a sense of order in the data without explicitly processing tokens sequentially.
	The original Transformer model proposed in Attention is All You Need had six layers each in both the encoder and decoder. Each layer consists of multi-head attention and feed-forward layers, with residual connections and normalization. The model was trained using the Adam optimizer and applied to machine translation tasks, where it demonstrated state-of-the-art performance, surpassing previous models like LSTMs and GRUs.
	One of the key benefits of the Transformer is its ability to parallelize training, as it does not rely on sequential data processing like RNNs. This parallelism allows it to leverage modern GPU architectures effectively, leading to faster training times and the ability to scale to much larger datasets. Furthermore, Transformers handle long-range dependencies better than previous models because self-attention allows every token to interact with every other token in the sequence, regardless of their distance from each other.
	"""

	print(summarizer(text, max_new_tokens=128)[0]['generated_text'])
	>> Attention is All You Need is a paper that revolutionized the field of natural language processing (NLP) by introducing the Transformer architecture, which relies solely on attention mechanisms to model long-range dependencies in sequential data. The Transformer consists of an encoder-decoder structure: the encoder takes in the input sequence, computes self-attention to understand relationships between tokens, and generates a context-aware representation; and the decoder generates the output sequence one token at a time by attending to both the previously generated tokens and encoder output.
	```

	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.2.1+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1