adnaan05's picture
Update README.md
991525e verified
|
raw
history blame
3.69 kB
metadata
library_name: transformers
license: apache-2.0
base_model: t5-small
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: cnn_news_summary_model_trained_on_reduced_data
    results: []
datasets:
  - abisee/cnn_dailymail

cnn_news_summary_model_trained_on_reduced_data

This model is a fine-tuned version of t5-small on an cnn_dailymail dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6597
  • Rouge1: 0.2162
  • Rouge2: 0.0943
  • Rougel: 0.1834
  • Rougelsum: 0.1834
  • Generated Length: 19.0

Model description

Base Model: t5-small, which is a smaller version of the T5 (Text-to-Text Transfer Transformer) model developed by Google.

This model can be particularly useful if you need to quickly summarize large volumes of text, making it easier to digest and understand key information.

Intended uses & limitations

  • Intended Use

    • The model is designed for text summarization, which involves condensing long pieces of text into shorter, more digestible summaries. Here are some specific use cases:

    • News Summarization: Quickly summarizing news articles to provide readers with the main points.

    • Document Summarization: Condensing lengthy reports or research papers into brief overviews.

    • Content Curation: Helping content creators and curators to generate summaries for newsletters, blogs, or social media posts.

    • Educational Tools: Assisting students and educators by summarizing academic texts and articles.

  • Limitations

    • While the model is powerful, it does have some limitations:

    • Accuracy: The summaries generated might not always capture all the key points accurately, especially for complex or nuanced texts.

    • Bias: The model can inherit biases present in the training data, which might affect the quality and neutrality of the summaries.

    • Context Understanding: It might struggle with understanding the full context of very long documents, leading to incomplete or misleading summaries.

    • Language and Style: The model’s output might not always match the desired tone or style, requiring further editing.

    • Data Dependency: Performance can vary depending on the quality and nature of the input data. It performs best on data similar to its training set (news articles)

Training and evaluation data

The model was trained using the Adam optimizer with a learning rate of 2e-05 over 2 epochs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Generated Length
No log 1.0 288 1.6727 0.217 0.0949 0.1841 0.1839 19.0
1.9118 2.0 576 1.6597 0.2162 0.0943 0.1834 0.1834 19.0

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1