YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

t5-small Quantized Model for Text Summarization on Reddit-TIFU dataset

This repository hosts a quantized version of the t5-small model, fine-tuned for text summarization using the Reddit-TIFU dataset. The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.

Model Details

  • Model Architecture: t5-small(short version)
  • Task: Text generation
  • Dataset: Reddit-TIFU (Hugging Face Datasets)
  • Quantization: Float16
  • Fine-tuning Framework: Hugging Face Transformers

Installation

pip install datasets transformers rouge-score evaluate

Loading the Model

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

# Load tokenizer and model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_name = "t5-small"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)

# Define test sentences
new_text = """
Today I was late to my morning meeting because I spilled coffee all over my laptop. 
Then I realized my backup laptop was also out of battery. 
Eventually joined from my phone, only to find out the meeting was cancelled.
"""

# Generate 
def generate_summary(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        max_length=512,
        truncation=True
    ).to(device)

    summary_ids = model.generate(
        inputs["input_ids"],
        max_length=100,
        min_length=5,
        num_beams=4,
        length_penalty=2.0,
        early_stopping=True
    )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

Performance Metrics

  • Rouge1: 19.590
  • Rouge2: 4.270
  • Rougel: 16.390
  • Rougelsum: 16.800

Fine-Tuning Details

Dataset

The dataset is sourced from Hugging Face’s Reddit-TIFU dataset. It contains 79,000 reddit post and their summaries.
The original training and testing sets were merged, shuffled, and re-split using an 90/10 ratio.

Training

  • Epochs: 3
  • Batch size: 8
  • Learning rate: 2e-5
  • Evaluation strategy: epoch

Quantization

Post-training quantization was applied using PyTorch’s half() precision (FP16) to reduce model size and inference time.


Repository Structure

.
β”œβ”€β”€ quantized-model/               # Contains the quantized model files
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer_config.json
β”‚   β”œβ”€β”€ spiece.model
β”‚   └── special_tokens_map.json
β”‚   β”œβ”€β”€ generation_config.jason
β”‚   └── tokenizer.json

β”œβ”€β”€ README.md                      # Model documentation

Limitations

  • The model is trained specifically for text summarization on reddit posts
  • FP16 quantization may result in slight numerical instability in edge cases.

Contributing

Feel free to open issues or submit pull requests to improve the model or documentation.

Downloads last month
110
Safetensors
Model size
60.5M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support