FLAN-T5-Small Fine-Tuned for Summarization
This model is a fine-tuned flan-t5-small
transformer trained to summarize text when provided the instructions to do so. The model was trained on abstracts from publicly available research papers and their summaries.
Model Overview
- Base Model:
flan-t5-small
- Type: Text summarization
- Framework: Hugging Face Transformers
- Training Method: LoRA using the PyTorch backend
Dataset Used
pieetie/pubmed-abstract-summary
- 4,331 biomedical research abstracts and their one-sentence summaries from the American National Library of Medicine.
- Summaries contain the key findings, methods, and significance of the research.
- Entries were duplicated, and the duplicated entries' abstracts had their sentences shuffled around. Any HTML tags present were removed.
Training Configuration
LoRA Parameters
Parameter | Value |
---|---|
Rank | 16 |
Alpha | 32 |
Dropout | 0.05 |
Bias | none |
Target Modules | q, v |
Training Parameters
Parameter | Value |
---|---|
Epochs | 5 |
Learning Rate | 1e-3 |
Batch Size | 16 |
Optimizer | AdamW |
Scheduler | Linear |
Padding Token ID | -100 |
Evaluation Metrics
Metric | Value |
---|---|
Training Loss | ~1.7974 |
Validation Loss | ~1.4674 |
Training Time | ~1.2918 hrs |
Hardware
M4 Macbook Pro
- 16 GB Unified RAM
- 10-core CPU
- 10-core GPU
GitHub Repository
The GitHub repository containing the training code can be found here.
Example Inference
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline
summarizer = pipeline(
task="summarization",
model=AutoModelForSeq2SeqLM.from_pretrained("KedarPanchal/flan-t5-small-summary-finetune")
tokenizer=AutoTokenizer.from_pretrained("google/flan-t5-small")
)
text = """Summarize the following: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data."""
summarizer(text, min_new_tokens=10, do_sample=False)
Generated text: "The transformer, based on attention mechanisms dispensing with recurrence and convolutions, achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over existing best results, and establishing a single-model state-of-the-art BLUE score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the literature training costs of best models."
License
The GNU GPLv3 License 2025 - Kedar Panchal. Please look at the LICENSE for more information.
- Downloads last month
- 47
Model tree for KedarPanchal/flan-t5-small-summary-finetune
Base model
google/flan-t5-small