# BART-Based Text Summarization Model for News Aggregation This repository hosts a BART transformer model fine-tuned for abstractive text summarization of news articles. It is designed to condense lengthy news reports into concise, informative summaries, enhancing user experience for news readers and aggregators. ## Model Details - **Model Architecture:** BART (Facebook's BART-base) - **Task:** Abstractive Text Summarization - **Domain:** News Articles - **Dataset:** Reddit-TIFU (Hugging Face Datasets) - **Fine-tuning Framework:** Hugging Face Transformers ## Usage ### Installation ```bash pip install datasets transformers rouge-score evaluate ``` ### Loading the Model ```python from transformers import BartTokenizer, BartForConditionalGeneration, Trainer, TrainingArguments, DataCollatorForSeq2Seq import torch # Load tokenizer and model device = 'cuda' if torch.cuda.is_available() else 'cpu' model_name = "facebook/bart-base" tokenizer = BartTokenizer.from_pretrained(model_name) model = BartForConditionalGeneration.from_pretrained(model_name).to(device) ``` ## Performance Metrics - **Rouge1 :** 25.500000 - **Rouge2 :** 7.860000 - **Rougel :** 20.640000 - **Rougelsum :** 21.180000 ## Fine-Tuning Details ### Dataset The dataset is sourced from Hugging Face’s Reddit-TIFU dataset. It contains 79,000 reddit post and their summaries. The original training and testing sets were merged, shuffled, and re-split using an 90/10 ratio. ### Training Configuration - **Epochs:** 3 - **Batch Size:** 8 - **Learning Rate:** 2e-5 - **Evaluation Strategy:** epoch ### Quantization Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency. ## Repository Structure ``` . ├── config.json ├── tokenizer_config.json ├── sepcial_tokens_map.json ├── tokenizer.json ├── model.safetensors # Fine Tuned Model ├── README.md # Model documentation ``` ## Limitations - The model may not generalize well to domains outside the fine-tuning dataset. - Quantization may result in minor accuracy degradation compared to full-precision models. ## Contributing Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.