bilal521 commited on
Commit
ec115ff
·
verified ·
1 Parent(s): 8e71d0d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📺 T5 YouTube Summarizer
2
+
3
+ This is a fine-tuned [`t5-base`](https://huggingface.co/t5-base) model for abstractive summarization of YouTube video transcripts. The model is trained on a custom dataset of video transcriptions and their manually written summaries.
4
+
5
+ ---
6
+
7
+ ## ✨ Model Details
8
+
9
+ - **Base Model**: [`t5-base`](https://huggingface.co/t5-base)
10
+ - **Task**: Abstractive Summarization
11
+ - **Training Data**: YouTube video transcripts and human-written summaries
12
+ - **Max Input Length**: 512 tokens
13
+ - **Max Output Length**: 256 tokens
14
+ - **Fine-tuning Epochs**: 10
15
+ - **Tokenizer**: `T5Tokenizer` (pretrained)
16
+
17
+ ---
18
+
19
+ ## 🧠 Intended Use
20
+
21
+ This model is designed to generate short, informative summaries from long transcripts of educational or conceptual YouTube videos. It can be used for:
22
+
23
+ - Quick understanding of long videos
24
+ - Automated content summaries for blogs, platforms, or note-taking tools
25
+ - Enhancing accessibility for long-form spoken content
26
+
27
+ ---
28
+
29
+ ## 🚀 How to Use
30
+
31
+ ```python
32
+ from transformers import T5ForConditionalGeneration, T5Tokenizer
33
+
34
+ # Load the model
35
+ model = T5ForConditionalGeneration.from_pretrained("your-username/t5-youtube-summarizer")
36
+ tokenizer = T5Tokenizer.from_pretrained("your-username/t5-youtube-summarizer")
37
+
38
+ # Define input text
39
+ text = "The video talks about coordinate covalent bonds, giving examples from..."
40
+
41
+ # Preprocess and summarize
42
+ inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
43
+
44
+ summary_ids = model.generate(
45
+ inputs,
46
+ max_length=256,
47
+ min_length=80,
48
+ num_beams=5,
49
+ length_penalty=2.0,
50
+ no_repeat_ngram_size=3,
51
+ early_stopping=True
52
+ )
53
+
54
+ summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
55
+ print(summary)