Spaces:

davila7
/

youtubegpt

Runtime error

App Files Files Community

davila7 commited on Feb 19, 2023

Commit

a52b6f6

1 Parent(s): dcf5d5c

all project

Browse files

Files changed (7) hide show

LICENSE +21 -0
README 2.md +117 -0
app.py +1 -1
packages.txt +1 -0
requirements.txt +20 -0
transcription.csv +2 -0
word_embeddings.csv +0 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Daniel Avila
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README 2.md ADDED Viewed

	@@ -0,0 +1,117 @@

+<h1 align="center">
+YoutubeGPT 🤖
+</h1>
+Read the article to know how it works: <a href="https://medium.com/@dan.avila7/youtube-gpt-start-a-chat-with-a-video-efe92a499e60">Medium Article</a>
+With Youtube GPT you will be able to extract all the information from a video on YouTube just by pasting the video link.
+You will obtain the transcription, the embedding of each segment and also ask questions to the video through a chat.
+All code was written with the help of <a href="https://codegpt.co">Code GPT</a>
+<a href="https://codegpt.co" target="_blank"><img width="753" alt="Captura de Pantalla 2023-02-08 a la(s) 9 16 43 p  m" src="https://user-images.githubusercontent.com/6216945/217699939-eca3ae47-c488-44da-9cf6-c7caef69e1a7.png"></a>
+<hr>
+<br>
+# Features
+- Video transcription with **OpenAI Whisper**
+- Embedding Transcript Segments with the OpenAI API (**text-embedding-ada-002**)
+- Chat with the video using **streamlit-chat** and OpenAI API (**text-davinci-003**)
+# Example
+For this example we are going to use this video from The PyCoach
+https://youtu.be/lKO3qDLCAnk
+Add the video URL and then click Start Analysis
+![Youtube](https://user-images.githubusercontent.com/6216945/217701635-7c386ca7-c802-4f56-8148-dcce57555b5a.gif)
+## Pytube and OpenAI Whisper
+The video will be downloaded with pytube and then OpenAI Whisper will take care of transcribing and segmenting the video.
+![Pyyube Whisper](https://user-images.githubusercontent.com/6216945/217704219-886d0afc-4181-4797-8827-82f4fd456f4f.gif)
+```python
+# Get the video
+youtube_video = YouTube(youtube_link)
+streams = youtube_video.streams.filter(only_audio=True)
+mp4_video = stream.download(filename='youtube_video.mp4')
+audio_file = open(mp4_video, 'rb')
+# whisper load base model
+model = whisper.load_model('base')
+# Whisper transcription
+output = model.transcribe("youtube_video.mp4")
+```
+## Embedding with "text-embedding-ada-002"
+We obtain the vectors with **text-embedding-ada-002** of each segment delivered by whisper
+![Embedding](https://user-images.githubusercontent.com/6216945/217705008-180285d7-6bce-40c3-8601-576cc2f38171.gif)
+```python
+# Embeddings
+segments = output['segments']
+for segment in segments:
+    openai.api_key = user_secret
+    response = openai.Embedding.create(
+        input= segment["text"].strip(),
+        model="text-embedding-ada-002"
+    )
+    embeddings = response['data'][0]['embedding']
+    meta = {
+        "text": segment["text"].strip(),
+        "start": segment['start'],
+        "end": segment['end'],
+        "embedding": embeddings
+    }
+    data.append(meta)
+pd.DataFrame(data).to_csv('word_embeddings.csv')
+```
+## OpenAI GPT-3
+We make a question to the vectorized text, we do the search of the context and then we send the prompt with the context to the model "text-davinci-003"
+![Question1](https://user-images.githubusercontent.com/6216945/217708086-b89dce2e-e3e2-47a7-b7dd-77e402d818cb.gif)
+We can even ask direct questions about what happened in the video. For example, here we ask about how long the exercise with Numpy that Pycoach did in the video took.
+![Question2](https://user-images.githubusercontent.com/6216945/217708485-df1edef3-d5f1-4b4a-a5c9-d08f31c80be4.gif)
+# Running Locally
+1. Clone the repository
+```bash
+git clone https://github.com/davila7/youtube-gpt
+cd youtube-gpt
+```
+2. Install dependencies
+These dependencies are required to install with the requirements.txt file:
+* streamlit
+* streamlit_chat
+* matplotlib
+* plotly
+* scipy
+* sklearn
+* pandas
+* numpy
+* git+https://github.com/openai/whisper.git
+* pytube
+* openai-whisper
+```bash
+pip install -r requirements.txt
+```
+3. Run the Streamlit server
+```bash
+streamlit run app.py
+```
+## Upcoming Features 🚀
+- Semantic search with embedding
+- Chart with emotional analysis
+- Connect with Pinecone

app.py CHANGED Viewed

@@ -180,7 +180,7 @@ with tab4:
         df['distances'] = distances_from_embeddings(q_embedding, df['embedding'].values, distance_metric='cosine')
         returns = []
-        # Sort by distance with 4 hints
         for i, row in df.sort_values('distances', ascending=True).head(4).iterrows():
             # Else add it to the text that is being returned
             returns.append(row["text"])

         df['distances'] = distances_from_embeddings(q_embedding, df['embedding'].values, distance_metric='cosine')
         returns = []
+        # Sort by distance with 2 hints
         for i, row in df.sort_values('distances', ascending=True).head(4).iterrows():
             # Else add it to the text that is being returned
             returns.append(row["text"])

packages.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ffmpeg

requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+#
+# This file is autogenerated by pip-compile with python 3.10
+# To update, run:
+#
+#    pip-compile
+#
+matplotlib
+plotly
+scipy
+sklearn
+scikit-learn
+pandas
+numpy
+git+https://github.com/openai/whisper.git
+pytube
+streamlit
+streamlit_chat
+openai
+pinecone-client
+python-dotenv

transcription.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ,title,transcription
2	+ 0,equivocarse," Entonces lo que tenemos que hacer es la institución es grandes, es si hacer un cambio interno, capacitarlos en las nuevas tecnologías y metodología y ellos tienen que seguir haciendo su departamento tecnológico internos en un proceso, pero tienen que crear una capacidad paralela de abrirse al ecosistema y de usar esta arta para crear nuevo valor y nuevos ingresos, esto no es votar tu plataforma actual, lo que no hacemos es buscar distinta alternativa, lo que hace la empresa buscar una línea, voy a hacer transformación digital, voy a hacer una app, no es transformación digital hacer una app, es buscar las distintas plataformas y las distintas tecnologías dentro de la organización y probar con muchas empresas afuera y a una de esas le van a apuntar, a 10 capácte no le apunten, pero el fracaso de aprendizaje, el fracaso no es fracaso, el fracaso es que aprendiste y puede seguir otro camino."

word_embeddings.csv ADDED Viewed

The diff for this file is too large to render. See raw diff