# ๐Ÿง  Text Summarization for Product Descriptions A **T5-small-based** abstractive summarization model fine-tuned on synthetic product description data. This model generates concise summaries of detailed product descriptions, ideal for catalog optimization, e-commerce listings, and content generation. --- ## โœจ Model Highlights - ๐Ÿ“Œ Based on [`t5-small`](https://huggingface.co/t5-small) - ๐Ÿงช Fine-tuned on a synthetic dataset of 50+ product descriptions and their summaries - โšก Supports **abstractive summarization** of English product texts - ๐Ÿง  Built using **Hugging Face Transformers** and **PyTorch** --- ## ๐Ÿง  Intended Uses - โœ… Auto-generating product summaries for catalogs or online listings - โœ… Shortening verbose product descriptions for UI-friendly displays - โœ… Content creation support for e-commerce and marketing --- ## ๐Ÿšซ Limitations - โŒ English-only (not trained for multilingual input) - ๐Ÿง  Cannot fact-check or verify real-world product details - ๐Ÿงช Trained on synthetic data โ€” real-world generalization may be limited - โš ๏ธ May generate generic or repetitive summaries for complex inputs --- ## ๐Ÿ‹๏ธโ€โ™‚๏ธ Training Details | Attribute | Value | |-------------------|-----------------------------------------------| | Base Model | `t5-small` | | Dataset | Custom synthetic CSV of product summaries | | Input Field | `product_description` | | Target Field | `summary` | | Max Token Length | 512 input / 64 summary | | Epochs | 3 | | Batch Size | 4 | | Optimizer | AdamW | | Loss Function | CrossEntropyLoss (via `Trainer`) | | Framework | PyTorch + Transformers | | Hardware | CUDA-enabled GPU | --- ## ๐Ÿ“Š Evaluation Metrics | Metric | Score (Synthetic Eval) | |-----------|------------------------| | ROUGE-1 | 24.49 | | ROUGE-2 | 22.10 | | ROUGE-L | 24.47 | | ROUGE-lsum| 24.46 | --- ## ๐Ÿš€ Usage ```python from transformers import T5Tokenizer, T5ForConditionalGeneration import torch model_name = "your-username/Text-Summarization-for-Product-Descriptions" tokenizer = T5Tokenizer.from_pretrained(model_name) model = T5ForConditionalGeneration.from_pretrained(model_name) model.eval() def summarize(text, model, tokenizer, max_input_length=512, max_output_length=64): model.eval() device = next(model.parameters()).device # get device (cpu or cuda) input_text = "summarize: " + text.strip() inputs = tokenizer( input_text, return_tensors="pt", truncation=True, padding="max_length", max_length=max_input_length ).to(device) # move inputs to device with torch.no_grad(): summary_ids = model.generate( input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_length=max_output_length, num_beams=4, early_stopping=True ) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) return summary # Example text = "This sleek electric kettle features a 1.7-liter capacity, fast-boil tech, auto shut-off, and a 360-degree swivel base." print("Summary:", summarize(text)) ``` ## ๐Ÿ“ Repository Structure ``` . โ”œโ”€โ”€ model/ # Fine-tuned model files (pytorch_model.bin, config.json) โ”œโ”€โ”€ tokenizer/ # Tokenizer config and vocab โ”œโ”€โ”€ training_script.py # Training code โ”œโ”€โ”€ product_descriptions.csv # Source dataset โ”œโ”€โ”€ utils.py # Preprocessing & summarization utilities โ”œโ”€โ”€ README.md # Model card ``` ## ๐Ÿค Contributing Feel free to raise issues or suggest improvements via pull requests. More training on real-world data and multilingual support is planned in future updates.