michaelthwan commited on
Commit
103f38e
Β·
1 Parent(s): 96a6d6a
DEV_LOG.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DEV log
2
+
3
+ Document interesting findings and journey of text tuning
4
+
5
+ Without comparison, it is quite difficult to understand quality of different text summarization approach. Of course it is very common in generative AI (like image models).
6
+ Therefore during the tuning, it is a good chance to record so for deeper understanding.
7
+
8
+ ## 20230617 Enhancement for timestamp summary
9
+
10
+ People seems to like shorter version in timestamped summary while maintaining useful keypoints in summary.
11
+
12
+ Test video: [[Youtube] 8 Surprising Habits That Made Me A Millionaire](https://www.youtube.com/watch?v=IScN1SOcj7A&ab_channel=AliAbdaal)
13
+
14
+ GPT-model: `gpt-3.5-turbo-16k`
15
+
16
+ - Previous version, 20 words constraint
17
+
18
+ ```
19
+ 0:00 - Habits for financial success: multiple streams of income, streams not reliant on trading time for money.
20
+ 0:32 - Strategic laziness: automating tasks, optimizing time, focusing on outcomes.
21
+ 1:03 - Developing an entrepreneurial mindset: identifying problems and finding solutions.
22
+ 2:03 - Applying strategic laziness to business: focusing on the end goal and avoiding unnecessary tasks.
23
+ 3:06 - Growth mindset: believing that everything is figureoutable.
24
+ 4:39 - The power of networking: making friends in real life and online, utilizing Twitter.
25
+ 6:10 - The importance of reading: gaining wisdom and knowledge from books.
26
+ 8:16 - Acquiring financial literacy: taking control of your financial life and understanding money
27
+ ```
28
+
29
+ - Tuning attempt #1: 10 words constraint
30
+
31
+ ```
32
+ 0:00 - Introduction to habits that helped the speaker become a millionaire.
33
+ 0:32 - Mindset shift: Millionaires rely on multiple streams of income.
34
+ 1:03 - Strategic laziness: Optimizing time and automating tasks.
35
+ 1:33 - Being strategic about goals and outcomes.
36
+ 2:03 - Applying strategic laziness to business and focusing on what really matters.
37
+ 2:36 - Developing an entrepreneurial diagnostic mindset to identify problems and solutions.
38
+ 3:06 - Recognizing problems in everyday life as potential business ideas.
39
+ 4:08 - Embracing a growth mindset: Everything is figureoutable.
40
+ 4:39 - Accelerating learning by asking the right people for help.
41
+ 5:10 - The importance of making friends and networking both in real life and online.
42
+ 6:10 - Leveraging the power of Twitter to connect with like-minded individuals and improve business growth.
43
+ 7:13 - The benefits of reading books and gaining wisdom from successful individuals.
44
+ 8:47 - Making financial literacy a priority to understand money management and investments
45
+ ```
46
+
47
+ - Tuning attempt #2: 8 words constraint (work better well)
48
+
49
+ ```
50
+ 0:00 - Introduction to habits for becoming a millionaire
51
+ 0:32 - Mindset shift: multiple income streams are key
52
+ 1:03 - Embrace strategic laziness to optimize time
53
+ 1:33 - Focus on the outcome and efficient tasks
54
+ 2:03 - Entrepreneurial mindset: identify problems and solutions
55
+ 2:36 - Embrace growth mindset: everything is figureoutable
56
+ 3:06 - Learn from the right people, ask for help
57
+ 3:37 - Making friends and networking on Twitter
58
+ 4:08 - Value of reading books and acquiring wisdom
59
+ 5:10 - Financial literacy is essential for success
60
+ ```
61
+
62
+ Let's further squeeze for fun
63
+
64
+ - Tuning attempt #3: 5 words constraint (work better well)
65
+
66
+ ```
67
+ 0:00 - Mindset shift for financial success
68
+ 0:32 - Strategic laziness and time optimization
69
+ 1:03 - Develop an entrepreneurial diagnostic mindset
70
+ 1:33 - Having a growth mindset
71
+ 2:03 - Being strategically lazy in achieving goals
72
+ 2:36 - Recognizing problems and creating solutions
73
+ 3:06 - Everything is figureoutable
74
+ 4:39 - Asking the right people for help
75
+ 5:10 - Making friends in real life and on the internet
76
+ 6:10 - Reading books for wisdom and knowledge
77
+ 9:19 - Acquiring financial literacy for success
78
+ ```
79
+
80
+ - Tuning attempt #4: 3 words constraint (work better well)
81
+ For some reason it didn't listen to you now
82
+
83
+ ```
84
+ 0:00 - Financial success requires multiple streams of income.
85
+ 0:32 - Strategic laziness optimizes time and productivity.
86
+ 1:33 - Be strategically lazy and focus on the end goal.
87
+ 2:03 - Approach business strategically and avoid wasting time.
88
+ 3:06 - Develop an entrepreneurial mindset and identify problems to solve.
89
+ 4:39 - Embrace a growth mindset and believe that everything is figureoutable.
90
+ 5:40 - Leverage your network and seek help from the right people.
91
+ 6:42 - Build both real-life and internet friendships, especially on Twitter.
92
+ 7:44 - Reading books accelerates learning and provides valuable wisdom.
93
+ 9:51 - Acquire financial literacy and take control of your financial life
94
+ ```
95
+
96
+ ## 20230617 Enhancement for summary
97
+
98
+ Not concise enough
99
+
100
+ Original
101
+
102
+ ```
103
+ Summary:
104
+ - This video discusses the challenges faced by individuals with ADHD in terms of organization, focus, and productivity.
105
+ - It emphasizes the importance of understanding and supporting ADHD individuals rather than labeling them negatively.
106
+ - The video provides strategies to manage ADHD symptoms, such as establishing connections, setting measurable goals, creating a structured environment, prioritizing quality sleep, engaging in regular exercise, and achieving a flow state.
107
+ - It highlights how the right support and guidance can transform ADHD from a hindrance to an asset, enabling individuals to achieve success and fulfillment.
108
+ - The video discusses the gender disparities in the diagnosis of ADHD, particularly among females.
109
+ - It points out that females tend to be the most undiagnosed group due to their symptoms not including hyperactivity or disruption.
110
+ - The video emphasizes that understanding these distinctions helps to break down the myths and stigma associated with ADHD.
111
+ - It notes that ADHD is not an inability or disability but rather a different way of thinking, with the potential for extraordinary achievements.
112
+ - The video encourages individuals with ADHD to embrace their creative and energetic nature, as well as their entrepreneurial spirit, and embrace challenges.
113
+ - It stresses the importance of having the right tools and support in place to harness the power of one’s brain and reach their full potential.
114
+ - The video aims to provide guidance on managing ADHD and concludes by expressing a hope for viewers to utilize their own brilliance to make the world a better place.
115
+ ```
116
+
117
+
118
+ - Tuning attempt #1: Formatted with summary, highlight. Unconstrainted content
119
+
120
+ ```
121
+ Summary:
122
+ This video discusses eight surprising habits that helped the speaker become a millionaire. The habits include having multiple streams of income, embracing strategic laziness, developing an entrepreneurial mindset, adopting a growth mindset, seeking help from the right people, making friends in real life and on the internet, reading a lot of books, and acquiring financial literacy.
123
+
124
+ Highlights:
125
+ - Having multiple streams of income and not relying solely on a day job is key to financial success.
126
+ - Strategic laziness involves optimizing time and automating tasks to work smarter, not harder.
127
+ - Developing an entrepreneurial mindset means identifying problems and finding solutions.
128
+ - Adopting a growth mindset and believing that everything is figureoutable can lead to success.
129
+ - Seeking help from the right people and networking is important for accelerating learning and growth.
130
+ - Making friends in real life and online, particularly through Twitter, can lead to valuable connections and opportunities.
131
+ - Reading books is essential for gaining wisdom and accelerating personal development.
132
+ - Acquiring financial literacy is crucial for understanding saving, investing, and how money works
133
+ ```
134
+
135
+ - Tuning attempt #2: Formatted with summary, highlight. Constrainted content (5 bullet points, <20 words)
136
+
137
+ ```
138
+ Summary: The video discusses eight surprising habits that helped the speaker become a millionaire, including having multiple streams of income, being strategically lazy to save time, developing an entrepreneurial and growth mindset, seeking help from the right people, making connections on Twitter, reading books, and acquiring financial literacy.
139
+
140
+ Highlights:
141
+ - Multiple streams of income and avoiding reliance on a day job are crucial for financial success.
142
+ - Strategic laziness involves optimizing time and automating tasks to increase productivity.
143
+ - An entrepreneurial mindset involves identifying problems and creating solutions.
144
+ - Having a growth mindset and believing everything is figureoutable can overcome obstacles.
145
+ - Networking and making connections on Twitter can lead to valuable opportunities
146
+ ```
147
+
148
+
149
+ - Tuning attempt #3: Formatted with summary, highlight. Constrainted content (5 bullet points, <20 words). add Emoji
150
+
151
+ ```
152
+ Summary: The video discusses eight surprising habits that helped the speaker become a millionaire. These habits include having multiple streams of income, strategic time management, an entrepreneurial mindset, a growth mindset, seeking help from others, making connections online, extensive reading, and financial literacy.
153
+
154
+ Highlights:
155
+ - πŸ’° Most millionaires have multiple streams of income, not reliant on trading time for money.
156
+ - ⏰ Being strategic with time and automating processes can save valuable time.
157
+ - πŸš€ Developing an entrepreneurial mindset involves identifying and solving problems for profit.
158
+ - 🌱 Adopting a growth mindset and believing everything is figureoutable helps overcome obstacles.
159
+ - 🀝 Leveraging the support and knowledge of others can accelerate success.
160
+ ```
161
+
digester/gradio_method_service.py CHANGED
@@ -240,12 +240,12 @@ Give the video type with JSON format like {"type": "N things"}, and exclude othe
240
  [Transcript with timestamp]
241
  """,
242
  prompt_main="""
243
- {transcript_with_ts}
244
  """,
245
  prompt_suffix="""
246
  [TASK]
247
  Convert this into youtube summary.
248
- Separate for 2-5minutes chunk, maximum 20 words for one line.
249
  Start with the timestamp followed by the summarized text for that chunk.
250
  Must use language: {language}
251
 
@@ -292,10 +292,13 @@ Additionally, since it is a Tutorial video, provide step by step instructions fo
292
  }
293
  FINAL_SUMMARY_FORMAT_CONSTRAINTS = {
294
  "N things": """
295
- Items mentioned in the video: (content of N things)
296
  """,
297
  "Tutorials": """
298
- Instructions: (step by step instructions)
 
 
 
299
  """,
300
  }
301
 
@@ -353,11 +356,11 @@ Instructions: (step by step instructions)
353
 
354
  @classmethod
355
  def execute_final_summary_chain(cls, g_inputs: GradioInputs, youtube_data: YoutubeData, video_type):
 
356
  if video_type in cls.FINAL_SUMMARY_TASK_CONSTRAINTS.keys():
357
  task_constraint = cls.FINAL_SUMMARY_TASK_CONSTRAINTS[video_type]
358
- format_constraint = cls.FINAL_SUMMARY_FORMAT_CONSTRAINTS[video_type]
359
  else:
360
- task_constraint, format_constraint = "", ""
361
  prompt = Prompt(
362
  cls.FINAL_SUMMARY_PROMPT.prompt_prefix.format(title=youtube_data.title),
363
  cls.FINAL_SUMMARY_PROMPT.prompt_main.format(transcript=youtube_data.full_content),
 
240
  [Transcript with timestamp]
241
  """,
242
  prompt_main="""
243
+ {transcript_with_ts}
244
  """,
245
  prompt_suffix="""
246
  [TASK]
247
  Convert this into youtube summary.
248
+ Separate for 2-5 minutes chunk, maximum 6 words as a noun for one line.
249
  Start with the timestamp followed by the summarized text for that chunk.
250
  Must use language: {language}
251
 
 
292
  }
293
  FINAL_SUMMARY_FORMAT_CONSTRAINTS = {
294
  "N things": """
295
+ Items mentioned in the video: (content of N things. Put different appropriate emoji in the beginning for each bullet point)
296
  """,
297
  "Tutorials": """
298
+ Instructions: (step by step instructions, up to five concise bullet points, less than 20 words. Put different appropriate emoji for each bullet point)
299
+ """,
300
+ "Others": """
301
+ Highlights: [Emoji] (content of highlights, up to five concise bullet points, less than 20 words. Put different appropriate emoji for each bullet point)
302
  """,
303
  }
304
 
 
356
 
357
  @classmethod
358
  def execute_final_summary_chain(cls, g_inputs: GradioInputs, youtube_data: YoutubeData, video_type):
359
+ format_constraint = cls.FINAL_SUMMARY_FORMAT_CONSTRAINTS[video_type]
360
  if video_type in cls.FINAL_SUMMARY_TASK_CONSTRAINTS.keys():
361
  task_constraint = cls.FINAL_SUMMARY_TASK_CONSTRAINTS[video_type]
 
362
  else:
363
+ task_constraint = ""
364
  prompt = Prompt(
365
  cls.FINAL_SUMMARY_PROMPT.prompt_prefix.format(title=youtube_data.title),
366
  cls.FINAL_SUMMARY_PROMPT.prompt_main.format(transcript=youtube_data.full_content),
digester/gradio_ui_service.py CHANGED
@@ -8,7 +8,7 @@ title_html = """
8
  <p align=\"center\">
9
  DigestEverythingGPT leverages ChatGPT/LLMs to help users quickly understand essential information from various forms of content, such as podcasts, YouTube videos, and PDF documents.<br>
10
  The prompt engineering is chained and tuned so that is result is of high quality and fast. It is not a simple single query and response tool.<br>
11
- Version 20230614_2 (
12
  <a href="https://github.com/michaelthwan/digest-everything-gpt"><i class="fa fa-github"></i> Github</a>
13
  ) (
14
  <a href="https://huggingface.co/spaces/michaelthwan/digest-everything-gpt"> HFSpace</a>
 
8
  <p align=\"center\">
9
  DigestEverythingGPT leverages ChatGPT/LLMs to help users quickly understand essential information from various forms of content, such as podcasts, YouTube videos, and PDF documents.<br>
10
  The prompt engineering is chained and tuned so that is result is of high quality and fast. It is not a simple single query and response tool.<br>
11
+ Version 20230617 (
12
  <a href="https://github.com/michaelthwan/digest-everything-gpt"><i class="fa fa-github"></i> Github</a>
13
  ) (
14
  <a href="https://huggingface.co/spaces/michaelthwan/digest-everything-gpt"> HFSpace</a>
digester/test_youtube_chain.py CHANGED
@@ -44,6 +44,11 @@ class VideoExample:
44
  video_id = "OrElyY7MFVs"
45
  return VideoExample.get_youtube_data("", video_id)
46
 
 
 
 
 
 
47
 
48
  class YoutubeTestChain:
49
  def __init__(self, api_key: str, gpt_model):
@@ -79,18 +84,20 @@ class YoutubeTestChain:
79
  if __name__ == '__main__':
80
  config = get_config()
81
  api_key = config.get("openai").get("api_key")
 
82
  assert api_key
83
 
84
- gradio_inputs = GradioInputs(apikey_textbox=api_key, source_textbox="", source_target_textbox="", qa_textbox="", chatbot=[], history=[])
 
85
  youtube_data: YoutubeData = VideoExample.get_nthings_8_habits()
86
 
87
- youtube_test_chain = YoutubeTestChain(api_key)
88
  # youtube_test_chain.test_youtube_classifier(gradio_inputs, youtube_data)
89
- youtube_test_chain.test_youtube_timestamped_summary(gradio_inputs, youtube_data)
90
  # video_type = "N things"
91
  # video_type = "Tutorials"
92
- # video_type = "Others"
93
- # youtube_test_chain.test_youtube_final_summary(gradio_inputs, youtube_data, video_type)
94
 
95
  # converter = Everything2Text4Prompt(openai_api_key="")
96
  # source_textbox = "youtube"
 
44
  video_id = "OrElyY7MFVs"
45
  return VideoExample.get_youtube_data("", video_id)
46
 
47
+ @staticmethod
48
+ def get_procrastination_long_vid():
49
+ video_id = "lF_KWLfQFs8"
50
+ return VideoExample.get_youtube_data("", video_id)
51
+
52
 
53
  class YoutubeTestChain:
54
  def __init__(self, api_key: str, gpt_model):
 
84
  if __name__ == '__main__':
85
  config = get_config()
86
  api_key = config.get("openai").get("api_key")
87
+ GPT_MODEL = "gpt-3.5-turbo-16k"
88
  assert api_key
89
 
90
+
91
+ gradio_inputs = GradioInputs(apikey_textbox=api_key, gpt_model_textbox=GPT_MODEL, source_textbox="", source_target_textbox="", qa_textbox="", language_textbox="en-US", chatbot=[], history=[])
92
  youtube_data: YoutubeData = VideoExample.get_nthings_8_habits()
93
 
94
+ youtube_test_chain = YoutubeTestChain(api_key, GPT_MODEL)
95
  # youtube_test_chain.test_youtube_classifier(gradio_inputs, youtube_data)
96
+ # youtube_test_chain.test_youtube_timestamped_summary(gradio_inputs, youtube_data)
97
  # video_type = "N things"
98
  # video_type = "Tutorials"
99
+ video_type = "Others"
100
+ youtube_test_chain.test_youtube_final_summary(gradio_inputs, youtube_data, video_type)
101
 
102
  # converter = Everything2Text4Prompt(openai_api_key="")
103
  # source_textbox = "youtube"