library_name: transformers
license: apache-2.0
base_model: t5-small
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: cnn_news_summary_model_trained_on_reduced_data
results: []
datasets:
- abisee/cnn_dailymail
cnn_news_summary_model_trained_on_reduced_data
This model is a fine-tuned version of t5-small on an cnn_dailymail dataset. It achieves the following results on the evaluation set:
- Loss: 1.6597
- Rouge1: 0.2162
- Rouge2: 0.0943
- Rougel: 0.1834
- Rougelsum: 0.1834
- Generated Length: 19.0
Model description
Base Model: t5-small, which is a smaller version of the T5 (Text-to-Text Transfer Transformer) model developed by Google.
This model can be particularly useful if you need to quickly summarize large volumes of text, making it easier to digest and understand key information.
Intended uses & limitations
Intended Use
The model is designed for text summarization, which involves condensing long pieces of text into shorter, more digestible summaries. Here are some specific use cases:
News Summarization: Quickly summarizing news articles to provide readers with the main points.
Document Summarization: Condensing lengthy reports or research papers into brief overviews.
Content Curation: Helping content creators and curators to generate summaries for newsletters, blogs, or social media posts.
Educational Tools: Assisting students and educators by summarizing academic texts and articles.
Limitations
While the model is powerful, it does have some limitations:
Accuracy: The summaries generated might not always capture all the key points accurately, especially for complex or nuanced texts.
Bias: The model can inherit biases present in the training data, which might affect the quality and neutrality of the summaries.
Context Understanding: It might struggle with understanding the full context of very long documents, leading to incomplete or misleading summaries.
Language and Style: The model’s output might not always match the desired tone or style, requiring further editing.
Data Dependency: Performance can vary depending on the quality and nature of the input data. It performs best on data similar to its training set (news articles)
Training and evaluation data
The model was trained using the Adam optimizer with a learning rate of 2e-05 over 2 epochs.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Generated Length |
---|---|---|---|---|---|---|---|---|
No log | 1.0 | 288 | 1.6727 | 0.217 | 0.0949 | 0.1841 | 0.1839 | 19.0 |
1.9118 | 2.0 | 576 | 1.6597 | 0.2162 | 0.0943 | 0.1834 | 0.1834 | 19.0 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.1+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1