ganesh3 commited on
Commit
dbd33b2
·
1 Parent(s): feedc29

first modification

Browse files
.env_template ADDED
@@ -0,0 +1 @@
 
 
1
+ YOUTUBE_API_KEY='YOUR YOUTUBE_API_KEY'
Dockerfile ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use an official Python runtime as a parent image
2
+ FROM python:3.9-slim
3
+
4
+ # Set the working directory in the container
5
+ WORKDIR /app
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ build-essential \
10
+ curl \
11
+ software-properties-common \
12
+ && rm -rf /var/lib/apt/lists/*
13
+
14
+ # Copy the requirements file into the container
15
+ COPY requirements.txt .
16
+
17
+ # Install any needed packages specified in requirements.txt
18
+ RUN pip install --no-cache-dir -r requirements.txt
19
+
20
+ # Copy the application code into the container
21
+ COPY app/ ./app/
22
+ COPY config/ ./config/
23
+ COPY data/ ./data/
24
+ COPY grafana/ ./grafana/
25
+
26
+ # Make port 8501 available to the world outside this container
27
+ EXPOSE 8501
28
+
29
+ # Run the Streamlit app when the container launches
30
+ CMD ["streamlit", "run", "app/main.py", "--server.port=8501", "--server.address=0.0.0.0"]
README.md CHANGED
@@ -1,2 +1,120 @@
1
- # rag-youtube-assistant
2
- A RAG assistant to fetch youtube transcript and build an AI chatbot on it
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # YouTube Assistant
2
+
3
+ ## Problem Description
4
+
5
+ In the era of abundant video content on YouTube, users often struggle to efficiently extract specific information or insights from lengthy videos without watching them in their entirety. This challenge is particularly acute when dealing with educational content, tutorials, or informative videos where key points may be scattered throughout the video's duration.
6
+
7
+ The YouTube Assistant project addresses this problem by providing a Retrieval-Augmented Generation (RAG) application that allows users to interact with and query video transcripts directly. This solution enables users to quickly access relevant information from YouTube videos without the need to watch them completely, saving time and improving the efficiency of information retrieval from video content.
8
+
9
+ ## Data
10
+
11
+ The YouTube Assistant utilizes data pulled in real-time using the YouTube Data API v3. This data is then processed and stored in two databases:
12
+
13
+ 1. SQLite database: For structured data storage
14
+ 2. Elasticsearch vector database: For efficient similarity searches on embedded text
15
+
16
+ ### Data Schema
17
+
18
+ The main columns in our data structure are:
19
+
20
+ ```json
21
+ {
22
+ "content": {"type": "text"},
23
+ "video_id": {"type": "keyword"},
24
+ "segment_id": {"type": "keyword"},
25
+ "start_time": {"type": "float"},
26
+ "duration": {"type": "float"},
27
+ "title": {"type": "text"},
28
+ "author": {"type": "keyword"},
29
+ "upload_date": {"type": "date"},
30
+ "view_count": {"type": "integer"},
31
+ "like_count": {"type": "integer"},
32
+ "comment_count": {"type": "integer"},
33
+ "video_duration": {"type": "text"}
34
+ }
35
+ ```
36
+
37
+ This schema allows for comprehensive storage of video metadata alongside the transcript content, enabling rich querying and analysis capabilities.
38
+
39
+ ## Functionality
40
+
41
+ The YouTube Assistant offers the following key features:
42
+
43
+ 1. **Real-time Data Extraction**: Utilizes the YouTube Data API v3 to fetch video data and transcripts on-demand.
44
+
45
+ 2. **Efficient Data Storage**: Stores structured data in SQLite and uses Elasticsearch for vector embeddings, allowing for fast retrieval and similarity searches.
46
+
47
+ 3. **Interactive Querying**: Provides a chat interface where users can ask questions about the video transcripts that have been downloaded or extracted in real-time.
48
+
49
+ 4. **Contextual Understanding**: Leverages RAG technology to understand the context of user queries and provide relevant information from the video transcripts.
50
+
51
+ 5. **Metadata Analysis**: Allows users to query not just the content of the videos but also metadata such as view counts, likes, and upload dates.
52
+
53
+ 6. **Time-stamped Responses**: Can provide information about specific segments of videos, including start times and durations.
54
+
55
+ By combining these features, the YouTube Assistant empowers users to efficiently extract insights and information from YouTube videos without the need to watch them in full, significantly enhancing the way people interact with and learn from video content.
56
+
57
+ ## Project Structure
58
+
59
+ The YouTube Assistant project is organized as follows:
60
+
61
+ ```
62
+ youtube-rag-app/
63
+ ├── app/
64
+ │ ├── main.py
65
+ │ ├── ui.py
66
+ │ ├── transcript_extractor.py
67
+ │ ├── data_processor.py
68
+ │ ├── elasticsearch_handler.py
69
+ │ ├── database.py
70
+ │ ├── rag.py
71
+ │ ├── query_rewriter.py
72
+ │ └── evaluation.py
73
+ ├── data/
74
+ │ └── sqlite.db
75
+ ├── config/
76
+ │ └── config.yaml
77
+ ├── requirements.txt
78
+ ├── Dockerfile
79
+ └── docker-compose.yml
80
+ ```
81
+
82
+ ### Directory and File Descriptions:
83
+
84
+ - `app/`: Contains the main application code
85
+ - `main.py`: Entry point of the application
86
+ - `ui.py`: Handles the user interface
87
+ - `transcript_extractor.py`: Manages YouTube transcript extraction
88
+ - `data_processor.py`: Processes and prepares data for storage and analysis
89
+ - `elasticsearch_handler.py`: Manages interactions with Elasticsearch
90
+ - `database.py`: Handles SQLite database operations
91
+ - `rag.py`: Implements the Retrieval-Augmented Generation logic
92
+ - `query_rewriter.py`: Refines and optimizes user queries
93
+ - `evaluation.py`: Contains evaluation metrics and functions
94
+ - `data/`: Stores the SQLite database
95
+ - `config/`: Contains configuration files
96
+ - `requirements.txt`: Lists all Python dependencies
97
+ - `Dockerfile`: Defines the Docker image for the application
98
+ - `docker-compose.yml`: Orchestrates the application and its services
99
+
100
+ ## Getting Started
101
+
102
+ git clone [email protected]:ganesh3/rag-youtube-assistant.git
103
+
104
+ ## Ingestion
105
+
106
+ ## Evaluation
107
+
108
+ ## Retrieval
109
+
110
+ ### RAG Flow
111
+
112
+ ## Monitoring
113
+
114
+
115
+ ## Usage Examples
116
+
117
+ (Provide some example queries and interactions with the YouTube Assistant here.)
118
+
119
+ ## License
120
+ GPL v3
app/data_processor.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from minsearch import Index
2
+ from sentence_transformers import SentenceTransformer
3
+ import numpy as np
4
+ from sklearn.metrics.pairwise import cosine_similarity
5
+ import re
6
+ from elasticsearch import Elasticsearch
7
+ import os
8
+
9
+ def clean_text(text):
10
+ # Remove special characters and extra whitespace
11
+ text = re.sub(r'[^\w\s]', '', text)
12
+ text = re.sub(r'\s+', ' ', text).strip()
13
+ return text
14
+
15
+ class DataProcessor:
16
+ def __init__(self, text_fields=["content", "title", "description"],
17
+ keyword_fields=["video_id", "start_time", "author", "upload_date"],
18
+ embedding_model="all-MiniLM-L6-v2"):
19
+ self.text_index = Index(text_fields=text_fields, keyword_fields=keyword_fields)
20
+ self.embedding_model = SentenceTransformer(embedding_model)
21
+ self.documents = []
22
+ self.embeddings = []
23
+
24
+ # Use environment variables for Elasticsearch configuration
25
+ elasticsearch_host = os.getenv('ELASTICSEARCH_HOST', 'localhost')
26
+ elasticsearch_port = int(os.getenv('ELASTICSEARCH_PORT', 9200))
27
+
28
+ # Initialize Elasticsearch client with explicit scheme
29
+ self.es = Elasticsearch([f'http://{elasticsearch_host}:{elasticsearch_port}'])
30
+
31
+ def process_transcript(self, video_id, transcript_data):
32
+ metadata = transcript_data['metadata']
33
+ transcript = transcript_data['transcript']
34
+
35
+ for i, segment in enumerate(transcript):
36
+ cleaned_text = clean_text(segment['text'])
37
+ doc = {
38
+ "video_id": video_id,
39
+ "content": cleaned_text,
40
+ "start_time": segment['start'],
41
+ "duration": segment['duration'],
42
+ "segment_id": f"{video_id}_{i}",
43
+ "title": metadata['title'],
44
+ "author": metadata['author'],
45
+ "upload_date": metadata['upload_date'],
46
+ "view_count": metadata['view_count'],
47
+ "like_count": metadata['like_count'],
48
+ "comment_count": metadata['comment_count'],
49
+ "video_duration": metadata['duration']
50
+ }
51
+ self.documents.append(doc)
52
+ self.embeddings.append(self.embedding_model.encode(cleaned_text + " " + metadata['title']))
53
+
54
+ def build_index(self, index_name):
55
+ self.text_index.fit(self.documents)
56
+ self.embeddings = np.array(self.embeddings)
57
+
58
+ # Create Elasticsearch index
59
+ if not self.es.indices.exists(index=index_name):
60
+ self.es.indices.create(index=index_name, body={
61
+ "mappings": {
62
+ "properties": {
63
+ "embedding": {"type": "dense_vector", "dims": self.embeddings.shape[1]},
64
+ "content": {"type": "text"},
65
+ "video_id": {"type": "keyword"},
66
+ "segment_id": {"type": "keyword"},
67
+ "start_time": {"type": "float"},
68
+ "duration": {"type": "float"},
69
+ "title": {"type": "text"},
70
+ "author": {"type": "keyword"},
71
+ "upload_date": {"type": "date"},
72
+ "view_count": {"type": "integer"},
73
+ "like_count": {"type": "integer"},
74
+ "comment_count": {"type": "integer"},
75
+ "video_duration": {"type": "text"}
76
+ }
77
+ }
78
+ })
79
+
80
+ # Index documents in Elasticsearch
81
+ for doc, embedding in zip(self.documents, self.embeddings):
82
+ doc['embedding'] = embedding.tolist()
83
+ self.es.index(index=index_name, body=doc, id=doc['segment_id'])
84
+
85
+ def search(self, query, filter_dict={}, boost_dict={}, num_results=10, method='hybrid', index_name=None):
86
+ if method == 'text':
87
+ return self.text_search(query, filter_dict, boost_dict, num_results)
88
+ elif method == 'embedding':
89
+ return self.embedding_search(query, num_results, index_name)
90
+ else: # hybrid search
91
+ text_results = self.text_search(query, filter_dict, boost_dict, num_results)
92
+ embedding_results = self.embedding_search(query, num_results, index_name)
93
+ return self.combine_results(text_results, embedding_results, num_results)
94
+
95
+ def text_search(self, query, filter_dict={}, boost_dict={}, num_results=10):
96
+ return self.text_index.search(query, filter_dict, boost_dict, num_results)
97
+
98
+ def embedding_search(self, query, num_results=10, index_name=None):
99
+ if index_name:
100
+ # Use Elasticsearch for embedding search
101
+ query_vector = self.embedding_model.encode(query).tolist()
102
+ script_query = {
103
+ "script_score": {
104
+ "query": {"match_all": {}},
105
+ "script": {
106
+ "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
107
+ "params": {"query_vector": query_vector}
108
+ }
109
+ }
110
+ }
111
+ response = self.es.search(
112
+ index=index_name,
113
+ body={
114
+ "size": num_results,
115
+ "query": script_query,
116
+ "_source": {"excludes": ["embedding"]}
117
+ }
118
+ )
119
+ return [hit['_source'] for hit in response['hits']['hits']]
120
+ else:
121
+ # Use in-memory embedding search
122
+ query_embedding = self.embedding_model.encode(query)
123
+ similarities = cosine_similarity([query_embedding], self.embeddings)[0]
124
+ top_indices = np.argsort(similarities)[::-1][:num_results]
125
+ return [self.documents[i] for i in top_indices]
126
+
127
+ def combine_results(self, text_results, embedding_results, num_results):
128
+ combined = []
129
+ for i in range(max(len(text_results), len(embedding_results))):
130
+ if i < len(text_results):
131
+ combined.append(text_results[i])
132
+ if i < len(embedding_results):
133
+ combined.append(embedding_results[i])
134
+
135
+ seen = set()
136
+ deduped = []
137
+ for doc in combined:
138
+ if doc['segment_id'] not in seen:
139
+ seen.add(doc['segment_id'])
140
+ deduped.append(doc)
141
+
142
+ return deduped[:num_results]
143
+
144
+ def process_query(self, query):
145
+ return clean_text(query)
app/database.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+ import os
3
+
4
+ class DatabaseHandler:
5
+ def __init__(self, db_path='data/sqlite.db'):
6
+ self.db_path = db_path
7
+ self.conn = None
8
+ self.create_tables()
9
+
10
+ def create_tables(self):
11
+ with sqlite3.connect(self.db_path) as conn:
12
+ cursor = conn.cursor()
13
+ cursor.execute('''
14
+ CREATE TABLE IF NOT EXISTS videos (
15
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
16
+ youtube_id TEXT UNIQUE,
17
+ title TEXT,
18
+ channel_name TEXT,
19
+ processed_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
20
+ )
21
+ ''')
22
+ cursor.execute('''
23
+ CREATE TABLE IF NOT EXISTS user_feedback (
24
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
25
+ video_id INTEGER,
26
+ query TEXT,
27
+ feedback INTEGER,
28
+ timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
29
+ FOREIGN KEY (video_id) REFERENCES videos (id)
30
+ )
31
+ ''')
32
+ cursor.execute('''
33
+ CREATE TABLE IF NOT EXISTS embedding_models (
34
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
35
+ model_name TEXT UNIQUE,
36
+ description TEXT
37
+ )
38
+ ''')
39
+ cursor.execute('''
40
+ CREATE TABLE IF NOT EXISTS elasticsearch_indices (
41
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
42
+ video_id INTEGER,
43
+ index_name TEXT,
44
+ embedding_model_id INTEGER,
45
+ FOREIGN KEY (video_id) REFERENCES videos (id),
46
+ FOREIGN KEY (embedding_model_id) REFERENCES embedding_models (id)
47
+ )
48
+ ''')
49
+ conn.commit()
50
+
51
+ def add_video(self, youtube_id, title, channel_name):
52
+ with sqlite3.connect(self.db_path) as conn:
53
+ cursor = conn.cursor()
54
+ cursor.execute('''
55
+ INSERT OR IGNORE INTO videos (youtube_id, title, channel_name)
56
+ VALUES (?, ?, ?)
57
+ ''', (youtube_id, title, channel_name))
58
+ conn.commit()
59
+ return cursor.lastrowid
60
+
61
+ def add_user_feedback(self, video_id, query, feedback):
62
+ with sqlite3.connect(self.db_path) as conn:
63
+ cursor = conn.cursor()
64
+ cursor.execute('''
65
+ INSERT INTO user_feedback (video_id, query, feedback)
66
+ VALUES (?, ?, ?)
67
+ ''', (video_id, query, feedback))
68
+ conn.commit()
69
+
70
+ def add_embedding_model(self, model_name, description):
71
+ with sqlite3.connect(self.db_path) as conn:
72
+ cursor = conn.cursor()
73
+ cursor.execute('''
74
+ INSERT OR IGNORE INTO embedding_models (model_name, description)
75
+ VALUES (?, ?)
76
+ ''', (model_name, description))
77
+ conn.commit()
78
+ return cursor.lastrowid
79
+
80
+ def add_elasticsearch_index(self, video_id, index_name, embedding_model_id):
81
+ with sqlite3.connect(self.db_path) as conn:
82
+ cursor = conn.cursor()
83
+ cursor.execute('''
84
+ INSERT INTO elasticsearch_indices (video_id, index_name, embedding_model_id)
85
+ VALUES (?, ?, ?)
86
+ ''', (video_id, index_name, embedding_model_id))
87
+ conn.commit()
88
+
89
+ def get_video_by_youtube_id(self, youtube_id):
90
+ with sqlite3.connect(self.db_path) as conn:
91
+ cursor = conn.cursor()
92
+ cursor.execute('SELECT * FROM videos WHERE youtube_id = ?', (youtube_id,))
93
+ return cursor.fetchone()
94
+
95
+ def get_elasticsearch_index(self, video_id, embedding_model_id):
96
+ with sqlite3.connect(self.db_path) as conn:
97
+ cursor = conn.cursor()
98
+ cursor.execute('''
99
+ SELECT index_name FROM elasticsearch_indices
100
+ WHERE video_id = ? AND embedding_model_id = ?
101
+ ''', (video_id, embedding_model_id))
102
+ result = cursor.fetchone()
103
+ return result[0] if result else None
app/elasticsearch_handler.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from elasticsearch import Elasticsearch
2
+ import uuid
3
+
4
+ class ElasticsearchHandler:
5
+ def __init__(self, host='localhost', port=9200):
6
+ self.es = Elasticsearch([{'host': host, 'port': port}])
7
+
8
+ def create_index(self, index_name):
9
+ if not self.es.indices.exists(index=index_name):
10
+ self.es.indices.create(index=index_name)
11
+
12
+ def index_document(self, index_name, doc_id, text, embedding):
13
+ body = {
14
+ 'text': text,
15
+ 'embedding': embedding.tolist()
16
+ }
17
+ self.es.index(index=index_name, id=doc_id, body=body)
18
+
19
+ def search(self, index_name, query_vector, top_k=5):
20
+ script_query = {
21
+ "script_score": {
22
+ "query": {"match_all": {}},
23
+ "script": {
24
+ "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
25
+ "params": {"query_vector": query_vector.tolist()}
26
+ }
27
+ }
28
+ }
29
+ response = self.es.search(
30
+ index=index_name,
31
+ body={
32
+ "size": top_k,
33
+ "query": script_query,
34
+ "_source": {"includes": ["text"]}
35
+ }
36
+ )
37
+ return [hit["_source"]["text"] for hit in response["hits"]["hits"]]
app/evaluation.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from sklearn.metrics.pairwise import cosine_similarity
2
+ import numpy as np
3
+
4
+ class EvaluationSystem:
5
+ def __init__(self, data_processor, database_handler):
6
+ self.data_processor = data_processor
7
+ self.db_handler = database_handler
8
+
9
+ def relevance_scoring(self, query, retrieved_docs, top_k=5):
10
+ query_embedding = self.data_processor.process_query(query)
11
+ doc_embeddings = [self.data_processor.process_query(doc) for doc in retrieved_docs]
12
+
13
+ similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
14
+ return np.mean(sorted(similarities, reverse=True)[:top_k])
15
+
16
+ def answer_similarity(self, generated_answer, reference_answer):
17
+ gen_embedding = self.data_processor.process_query(generated_answer)
18
+ ref_embedding = self.data_processor.process_query(reference_answer)
19
+ return cosine_similarity([gen_embedding], [ref_embedding])[0][0]
20
+
21
+ def human_evaluation(self, video_id, query):
22
+ with self.db_handler.conn:
23
+ cursor = self.db_handler.conn.cursor()
24
+ cursor.execute('''
25
+ SELECT AVG(feedback) FROM user_feedback
26
+ WHERE video_id = ? AND query = ?
27
+ ''', (video_id, query))
28
+ result = cursor.fetchone()
29
+ return result[0] if result[0] is not None else 0
30
+
31
+ def evaluate_rag_performance(self, rag_system, test_queries, reference_answers, index_name):
32
+ relevance_scores = []
33
+ similarity_scores = []
34
+ human_scores = []
35
+
36
+ for query, reference in zip(test_queries, reference_answers):
37
+ retrieved_docs = rag_system.es_handler.search(index_name, rag_system.data_processor.process_query(query))
38
+ generated_answer = rag_system.query(index_name, query)
39
+
40
+ relevance_scores.append(self.relevance_scoring(query, retrieved_docs))
41
+ similarity_scores.append(self.answer_similarity(generated_answer, reference))
42
+ human_scores.append(self.human_evaluation(index_name, query)) # Assuming index_name can be used as video_id
43
+
44
+ return {
45
+ "avg_relevance_score": np.mean(relevance_scores),
46
+ "avg_similarity_score": np.mean(similarity_scores),
47
+ "avg_human_score": np.mean(human_scores)
48
+ }
app/generate_ground_truth.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import pandas as pd
3
+ import json
4
+ from youtube_transcript_api import YouTubeTranscriptApi
5
+ from tqdm import tqdm
6
+ import requests
7
+
8
+ OLLAMA_HOST = os.getenv('OLLAMA_HOST', 'localhost')
9
+ OLLAMA_PORT = os.getenv('OLLAMA_PORT', '11434')
10
+
11
+ def get_transcript(video_id):
12
+ try:
13
+ transcript = YouTubeTranscriptApi.get_transcript(video_id)
14
+ return " ".join([entry['text'] for entry in transcript])
15
+ except Exception as e:
16
+ print(f"Error extracting transcript for video {video_id}: {str(e)}")
17
+ return None
18
+
19
+ def generate_questions(transcript):
20
+ prompt_template = """
21
+ You are an AI assistant tasked with generating questions based on a YouTube video transcript.
22
+ Formulate 10 questions that a user might ask based on the provided transcript.
23
+ Make the questions specific to the content of the transcript.
24
+ The questions should be complete and not too short. Use as few words as possible from the transcript.
25
+
26
+ The transcript:
27
+
28
+ {transcript}
29
+
30
+ Provide the output in parsable JSON without using code blocks:
31
+
32
+ {{"questions": ["question1", "question2", ..., "question10"]}}
33
+ """.strip()
34
+
35
+ prompt = prompt_template.format(transcript=transcript)
36
+
37
+ response = requests.post(f'http://{OLLAMA_HOST}:{OLLAMA_PORT}/api/generate', json={
38
+ 'model': 'phi3.5',
39
+ 'prompt': prompt
40
+ })
41
+
42
+ if response.status_code == 200:
43
+ return json.loads(response.json()['response'])
44
+ else:
45
+ print(f"Error: {response.status_code} - {response.text}")
46
+ return None
47
+
48
+ def main():
49
+ video_id = "zjkBMFhNj_g"
50
+ transcript = get_transcript(video_id)
51
+
52
+ if transcript:
53
+ questions = generate_questions(transcript)
54
+
55
+ if questions:
56
+ df = pd.DataFrame([(video_id, q) for q in questions['questions']], columns=['video_id', 'question'])
57
+
58
+ os.makedirs('data', exist_ok=True)
59
+ df.to_csv('data/ground-truth-retrieval.csv', index=False)
60
+ print("Ground truth data saved to data/ground-truth-retrieval.csv")
61
+ else:
62
+ print("Failed to generate questions.")
63
+ else:
64
+ print("Failed to generate ground truth data due to transcript retrieval error.")
65
+
66
+ if __name__ == "__main__":
67
+ main()
app/main.py ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ from transcript_extractor import extract_video_id, get_transcript, get_channel_videos, process_videos
4
+ from data_processor import DataProcessor
5
+ from database import DatabaseHandler
6
+ from rag import RAGSystem
7
+ from query_rewriter import QueryRewriter
8
+ from evaluation import EvaluationSystem
9
+ from sentence_transformers import SentenceTransformer
10
+ import os
11
+ import json
12
+ import requests
13
+ from tqdm import tqdm
14
+ import sqlite3
15
+
16
+ # Initialize components
17
+ @st.cache_resource
18
+ def init_components():
19
+ db_handler = DatabaseHandler()
20
+ data_processor = DataProcessor()
21
+ rag_system = RAGSystem(data_processor)
22
+ query_rewriter = QueryRewriter()
23
+ evaluation_system = EvaluationSystem(data_processor, db_handler)
24
+ return db_handler, data_processor, rag_system, query_rewriter, evaluation_system
25
+
26
+ db_handler, data_processor, rag_system, query_rewriter, evaluation_system = init_components()
27
+
28
+ # Ground Truth Generation
29
+ def generate_questions(transcript):
30
+ OLLAMA_HOST = os.getenv('OLLAMA_HOST', 'localhost')
31
+ OLLAMA_PORT = os.getenv('OLLAMA_PORT', '11434')
32
+ prompt_template = """
33
+ You are an AI assistant tasked with generating questions based on a YouTube video transcript.
34
+ Formulate 10 questions that a user might ask based on the provided transcript.
35
+ Make the questions specific to the content of the transcript.
36
+ The questions should be complete and not too short. Use as few words as possible from the transcript.
37
+
38
+ The transcript:
39
+
40
+ {transcript}
41
+
42
+ Provide the output in parsable JSON without using code blocks:
43
+
44
+ {{"questions": ["question1", "question2", ..., "question10"]}}
45
+ """.strip()
46
+
47
+ prompt = prompt_template.format(transcript=transcript)
48
+
49
+ try:
50
+ response = requests.post(f'http://{OLLAMA_HOST}:{OLLAMA_PORT}/api/generate', json={
51
+ 'model': 'phi3.5',
52
+ 'prompt': prompt
53
+ })
54
+ response.raise_for_status()
55
+ return json.loads(response.json()['response'])
56
+ except requests.RequestException as e:
57
+ st.error(f"Error generating questions: {str(e)}")
58
+ return None
59
+
60
+ def generate_ground_truth(video_id):
61
+ transcript_data = get_transcript(video_id)
62
+
63
+ if transcript_data and 'transcript' in transcript_data:
64
+ full_transcript = " ".join([entry['text'] for entry in transcript_data['transcript']])
65
+ questions = generate_questions(full_transcript)
66
+
67
+ if questions and 'questions' in questions:
68
+ df = pd.DataFrame([(video_id, q) for q in questions['questions']], columns=['video_id', 'question'])
69
+
70
+ os.makedirs('data', exist_ok=True)
71
+ df.to_csv('data/ground-truth-retrieval.csv', index=False)
72
+ st.success("Ground truth data generated and saved to data/ground-truth-retrieval.csv")
73
+ return df
74
+ else:
75
+ st.error("Failed to generate questions.")
76
+ else:
77
+ st.error("Failed to generate ground truth data due to transcript retrieval error.")
78
+ return None
79
+
80
+ # RAG Evaluation
81
+ def evaluate_rag(sample_size=200):
82
+ try:
83
+ ground_truth = pd.read_csv('data/ground-truth-retrieval.csv')
84
+ except FileNotFoundError:
85
+ st.error("Ground truth file not found. Please generate ground truth data first.")
86
+ return None
87
+
88
+ sample = ground_truth.sample(n=min(sample_size, len(ground_truth)), random_state=1)
89
+ evaluations = []
90
+
91
+ prompt_template = """
92
+ You are an expert evaluator for a Youtube transcript assistant.
93
+ Your task is to analyze the relevance of the generated answer to the given question.
94
+ Based on the relevance of the generated answer, you will classify it
95
+ as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".
96
+
97
+ Here is the data for evaluation:
98
+
99
+ Question: {question}
100
+ Generated Answer: {answer_llm}
101
+
102
+ Please analyze the content and context of the generated answer in relation to the question
103
+ and provide your evaluation in parsable JSON without using code blocks:
104
+
105
+ {{
106
+ "Relevance": "NON_RELEVANT" | "PARTLY_RELEVANT" | "RELEVANT",
107
+ "Explanation": "[Provide a brief explanation for your evaluation]"
108
+ }}
109
+ """.strip()
110
+
111
+ progress_bar = st.progress(0)
112
+ for i, (_, row) in enumerate(sample.iterrows()):
113
+ question = row['question']
114
+ answer_llm = rag_system.query(question)
115
+ prompt = prompt_template.format(question=question, answer_llm=answer_llm)
116
+ evaluation = rag_system.query(prompt) # Assuming rag_system can handle this type of query
117
+ try:
118
+ evaluation_json = json.loads(evaluation)
119
+ evaluations.append((row['video_id'], question, answer_llm, evaluation_json['Relevance'], evaluation_json['Explanation']))
120
+ except json.JSONDecodeError:
121
+ st.warning(f"Failed to parse evaluation for question: {question}")
122
+ progress_bar.progress((i + 1) / len(sample))
123
+
124
+ # Store RAG evaluations in the database
125
+ conn = sqlite3.connect('data/sqlite.db')
126
+ cursor = conn.cursor()
127
+ cursor.execute('''
128
+ CREATE TABLE IF NOT EXISTS rag_evaluations (
129
+ video_id TEXT,
130
+ question TEXT,
131
+ answer TEXT,
132
+ relevance TEXT,
133
+ explanation TEXT
134
+ )
135
+ ''')
136
+ cursor.executemany('''
137
+ INSERT INTO rag_evaluations (video_id, question, answer, relevance, explanation)
138
+ VALUES (?, ?, ?, ?, ?)
139
+ ''', evaluations)
140
+ conn.commit()
141
+ conn.close()
142
+
143
+ st.success("Evaluation complete. Results stored in the database.")
144
+ return evaluations
145
+
146
+ def main():
147
+ st.title("YouTube Transcript RAG System")
148
+
149
+ tab1, tab2, tab3 = st.tabs(["RAG System", "Ground Truth Generation", "Evaluation"])
150
+
151
+ with tab1:
152
+ st.header("RAG System")
153
+ # Input section
154
+ input_type = st.radio("Select input type:", ["Video URL", "Channel URL", "YouTube ID"])
155
+ input_value = st.text_input("Enter the URL or ID:")
156
+ embedding_model = st.selectbox("Select embedding model:", ["all-MiniLM-L6-v2", "all-mpnet-base-v2"])
157
+
158
+ if st.button("Process"):
159
+ with st.spinner("Processing..."):
160
+ data_processor.embedding_model = SentenceTransformer(embedding_model)
161
+ if input_type == "Video URL":
162
+ video_id = extract_video_id(input_value)
163
+ if video_id:
164
+ process_single_video(video_id, embedding_model)
165
+ else:
166
+ st.error("Failed to extract video ID from the URL")
167
+ elif input_type == "Channel URL":
168
+ channel_videos = get_channel_videos(input_value)
169
+ if channel_videos:
170
+ process_multiple_videos([video['video_id'] for video in channel_videos], embedding_model)
171
+ else:
172
+ st.error("Failed to retrieve videos from the channel")
173
+ else:
174
+ process_single_video(input_value, embedding_model)
175
+
176
+ # Query section
177
+ st.subheader("Query the RAG System")
178
+ query = st.text_input("Enter your query:")
179
+ rewrite_method = st.radio("Query rewriting method:", ["None", "Chain of Thought", "ReAct"])
180
+ search_method = st.radio("Search method:", ["Hybrid", "Text-only", "Embedding-only"])
181
+
182
+ if st.button("Search"):
183
+ with st.spinner("Searching..."):
184
+ if rewrite_method == "Chain of Thought":
185
+ query = query_rewriter.rewrite_cot(query)
186
+ elif rewrite_method == "ReAct":
187
+ query = query_rewriter.rewrite_react(query)
188
+
189
+ search_method_map = {"Hybrid": "hybrid", "Text-only": "text", "Embedding-only": "embedding"}
190
+ response = rag_system.query(query, search_method=search_method_map[search_method])
191
+ st.write("Response:", response)
192
+
193
+ # Feedback
194
+ feedback = st.radio("Provide feedback:", ["+1", "-1"])
195
+ if st.button("Submit Feedback"):
196
+ db_handler.add_user_feedback("all_videos", query, 1 if feedback == "+1" else -1)
197
+ st.success("Feedback submitted successfully!")
198
+
199
+ with tab2:
200
+ st.header("Ground Truth Generation")
201
+ video_id = st.text_input("Enter YouTube Video ID for ground truth generation:")
202
+ if st.button("Generate Ground Truth"):
203
+ with st.spinner("Generating ground truth..."):
204
+ ground_truth_df = generate_ground_truth(video_id)
205
+ if ground_truth_df is not None:
206
+ st.dataframe(ground_truth_df)
207
+ csv = ground_truth_df.to_csv(index=False)
208
+ st.download_button(
209
+ label="Download Ground Truth CSV",
210
+ data=csv,
211
+ file_name="ground_truth.csv",
212
+ mime="text/csv",
213
+ )
214
+
215
+ with tab3:
216
+ st.header("RAG Evaluation")
217
+ sample_size = st.number_input("Enter sample size for evaluation:", min_value=1, max_value=1000, value=200)
218
+ if st.button("Run Evaluation"):
219
+ with st.spinner("Running evaluation..."):
220
+ evaluation_results = evaluate_rag(sample_size)
221
+ if evaluation_results:
222
+ st.write("Evaluation Results:")
223
+ st.dataframe(pd.DataFrame(evaluation_results, columns=['Video ID', 'Question', 'Answer', 'Relevance', 'Explanation']))
224
+
225
+ @st.cache_data
226
+ def process_single_video(video_id, embedding_model):
227
+ # Check if the video has already been processed with the current embedding model
228
+ existing_index = db_handler.get_elasticsearch_index(video_id, embedding_model)
229
+ if existing_index:
230
+ st.info(f"Video {video_id} has already been processed with {embedding_model}. Using existing index: {existing_index}")
231
+ return existing_index
232
+
233
+ transcript_data = get_transcript(video_id)
234
+ if transcript_data:
235
+ # Store video metadata in the database
236
+ video_data = {
237
+ 'video_id': video_id,
238
+ 'title': transcript_data['metadata'].get('title', 'Unknown Title'),
239
+ 'author': transcript_data['metadata'].get('author', 'Unknown Author'),
240
+ 'upload_date': transcript_data['metadata'].get('upload_date', 'Unknown Date'),
241
+ 'view_count': int(transcript_data['metadata'].get('view_count', 0)),
242
+ 'like_count': int(transcript_data['metadata'].get('like_count', 0)),
243
+ 'comment_count': int(transcript_data['metadata'].get('comment_count', 0)),
244
+ 'video_duration': transcript_data['metadata'].get('duration', 'Unknown Duration')
245
+ }
246
+ db_handler.add_video(video_data)
247
+
248
+ # Store transcript segments in the database
249
+ for i, segment in enumerate(transcript_data['transcript']):
250
+ segment_data = {
251
+ 'segment_id': f"{video_id}_{i}",
252
+ 'video_id': video_id,
253
+ 'content': segment.get('text', ''),
254
+ 'start_time': segment.get('start', 0),
255
+ 'duration': segment.get('duration', 0)
256
+ }
257
+ db_handler.add_transcript_segment(segment_data)
258
+
259
+ # Process transcript for RAG system
260
+ data_processor.process_transcript(video_id, transcript_data)
261
+
262
+ # Create Elasticsearch index
263
+ index_name = f"video_{video_id}_{embedding_model}"
264
+ data_processor.build_index(index_name)
265
+
266
+ # Store Elasticsearch index information
267
+ db_handler.add_elasticsearch_index(video_id, index_name, embedding_model)
268
+
269
+ st.success(f"Processed and indexed transcript for video {video_id}")
270
+ st.write("Metadata:", transcript_data['metadata'])
271
+ return index_name
272
+ else:
273
+ st.error(f"Failed to retrieve transcript for video {video_id}")
274
+ return None
275
+
276
+ @st.cache_data
277
+ def process_multiple_videos(video_ids, embedding_model):
278
+ indices = []
279
+ for video_id in video_ids:
280
+ index = process_single_video(video_id, embedding_model)
281
+ if index:
282
+ indices.append(index)
283
+ st.success(f"Processed and indexed transcripts for {len(indices)} videos")
284
+ return indices
285
+
286
+ if __name__ == "__main__":
287
+ main()
app/minsearch.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+
3
+ from sklearn.feature_extraction.text import TfidfVectorizer
4
+ from sklearn.metrics.pairwise import cosine_similarity
5
+
6
+ import numpy as np
7
+
8
+
9
+ class Index:
10
+ """
11
+ A simple search index using TF-IDF and cosine similarity for text fields and exact matching for keyword fields.
12
+
13
+ Attributes:
14
+ text_fields (list): List of text field names to index.
15
+ keyword_fields (list): List of keyword field names to index.
16
+ vectorizers (dict): Dictionary of TfidfVectorizer instances for each text field.
17
+ keyword_df (pd.DataFrame): DataFrame containing keyword field data.
18
+ text_matrices (dict): Dictionary of TF-IDF matrices for each text field.
19
+ docs (list): List of documents indexed.
20
+ """
21
+
22
+ def __init__(self, text_fields, keyword_fields, vectorizer_params={}):
23
+ """
24
+ Initializes the Index with specified text and keyword fields.
25
+
26
+ Args:
27
+ text_fields (list): List of text field names to index.
28
+ keyword_fields (list): List of keyword field names to index.
29
+ vectorizer_params (dict): Optional parameters to pass to TfidfVectorizer.
30
+ """
31
+ self.text_fields = text_fields
32
+ self.keyword_fields = keyword_fields
33
+
34
+ self.vectorizers = {field: TfidfVectorizer(**vectorizer_params) for field in text_fields}
35
+ self.keyword_df = None
36
+ self.text_matrices = {}
37
+ self.docs = []
38
+
39
+ def fit(self, docs):
40
+ """
41
+ Fits the index with the provided documents.
42
+
43
+ Args:
44
+ docs (list of dict): List of documents to index. Each document is a dictionary.
45
+ """
46
+ self.docs = docs
47
+ keyword_data = {field: [] for field in self.keyword_fields}
48
+
49
+ for field in self.text_fields:
50
+ texts = [doc.get(field, '') for doc in docs]
51
+ self.text_matrices[field] = self.vectorizers[field].fit_transform(texts)
52
+
53
+ for doc in docs:
54
+ for field in self.keyword_fields:
55
+ keyword_data[field].append(doc.get(field, ''))
56
+
57
+ self.keyword_df = pd.DataFrame(keyword_data)
58
+
59
+ return self
60
+
61
+ def search(self, query, filter_dict={}, boost_dict={}, num_results=10):
62
+ """
63
+ Searches the index with the given query, filters, and boost parameters.
64
+
65
+ Args:
66
+ query (str): The search query string.
67
+ filter_dict (dict): Dictionary of keyword fields to filter by. Keys are field names and values are the values to filter by.
68
+ boost_dict (dict): Dictionary of boost scores for text fields. Keys are field names and values are the boost scores.
69
+ num_results (int): The number of top results to return. Defaults to 10.
70
+
71
+ Returns:
72
+ list of dict: List of documents matching the search criteria, ranked by relevance.
73
+ """
74
+ query_vecs = {field: self.vectorizers[field].transform([query]) for field in self.text_fields}
75
+ scores = np.zeros(len(self.docs))
76
+
77
+ # Compute cosine similarity for each text field and apply boost
78
+ for field, query_vec in query_vecs.items():
79
+ sim = cosine_similarity(query_vec, self.text_matrices[field]).flatten()
80
+ boost = boost_dict.get(field, 1)
81
+ scores += sim * boost
82
+
83
+ # Apply keyword filters
84
+ for field, value in filter_dict.items():
85
+ if field in self.keyword_fields:
86
+ mask = self.keyword_df[field] == value
87
+ scores = scores * mask.to_numpy()
88
+
89
+ # Use argpartition to get top num_results indices
90
+ top_indices = np.argpartition(scores, -num_results)[-num_results:]
91
+ top_indices = top_indices[np.argsort(-scores[top_indices])]
92
+
93
+ # Filter out zero-score results
94
+ top_docs = [self.docs[i] for i in top_indices if scores[i] > 0]
95
+
96
+ return top_docs
app/query_rewriter.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ollama
2
+
3
+ class QueryRewriter:
4
+ def __init__(self):
5
+ self.model = "phi" # Using Phi-3.5 model
6
+
7
+ def rewrite_cot(self, query):
8
+ prompt = f"""
9
+ Rewrite the following query using Chain-of-Thought reasoning:
10
+ Query: {query}
11
+
12
+ Rewritten query:
13
+ """
14
+ response = ollama.generate(model=self.model, prompt=prompt)
15
+ return response['response'].strip()
16
+
17
+ def rewrite_react(self, query):
18
+ prompt = f"""
19
+ Rewrite the following query using the ReAct framework (Reasoning and Acting):
20
+ Query: {query}
21
+
22
+ Thought 1:
23
+ Action 1:
24
+ Observation 1:
25
+
26
+ Thought 2:
27
+ Action 2:
28
+ Observation 2:
29
+
30
+ Final rewritten query:
31
+ """
32
+ response = ollama.generate(model=self.model, prompt=prompt)
33
+ return response['response'].strip()
app/rag.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ollama
2
+
3
+ class RAGSystem:
4
+ def __init__(self, data_processor):
5
+ self.data_processor = data_processor
6
+ self.model = "phi3.5" # Using Phi-3.5 model
7
+
8
+ def query(self, user_query, top_k=3, search_method='hybrid'):
9
+ # Retrieve relevant documents using the specified search method
10
+ relevant_docs = self.data_processor.search(user_query, num_results=top_k, method=search_method)
11
+
12
+ # Construct the prompt
13
+ context = "\n".join([doc['content'] for doc in relevant_docs])
14
+ prompt = f"Context: {context}\n\nQuestion: {user_query}\n\nAnswer:"
15
+
16
+ # Generate response using Ollama
17
+ response = ollama.generate(model=self.model, prompt=prompt)
18
+
19
+ return response['response']
20
+
21
+ def rerank_documents(self, documents, query):
22
+ # Implement a simple re-ranking strategy
23
+ # This could be improved with more sophisticated methods
24
+ reranked = sorted(documents, key=lambda doc: self.calculate_relevance(doc['content'], query), reverse=True)
25
+ return reranked
26
+
27
+ def calculate_relevance(self, document, query):
28
+ # Simple relevance calculation based on word overlap
29
+ doc_words = set(document.lower().split())
30
+ query_words = set(query.lower().split())
31
+ return len(doc_words.intersection(query_words)) / len(query_words)
app/rag_evaluation.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pandas as pd
2
+ import numpy as np
3
+ from tqdm import tqdm
4
+ import json
5
+ import requests
6
+ import sqlite3
7
+ from minsearch import Index
8
+
9
+ # Database connection
10
+ conn = sqlite3.connect('data/sqlite.db')
11
+ cursor = conn.cursor()
12
+
13
+ # Load ground truth data from CSV
14
+ def load_ground_truth():
15
+ return pd.read_csv('data/ground-truth-retrieval.csv')
16
+
17
+ ground_truth = load_ground_truth()
18
+
19
+ # Load transcript data
20
+ def load_transcripts():
21
+ cursor.execute("SELECT * FROM transcript_segments")
22
+ rows = cursor.fetchall()
23
+ return pd.DataFrame(rows, columns=['segment_id', 'video_id', 'content', 'start_time', 'duration'])
24
+
25
+ transcripts = load_transcripts()
26
+
27
+ # Create index
28
+ index = Index(
29
+ text_fields=['content'],
30
+ keyword_fields=['video_id', 'segment_id']
31
+ )
32
+ index.fit(transcripts.to_dict('records'))
33
+
34
+ # RAG flow
35
+ def search(query):
36
+ boost = {}
37
+ results = index.search(
38
+ query=query,
39
+ filter_dict={},
40
+ boost_dict=boost,
41
+ num_results=10
42
+ )
43
+ return results
44
+
45
+ prompt_template = """
46
+ You're an AI assistant for YouTube video transcripts. Answer the QUESTION based on the CONTEXT from our transcript database.
47
+ Use only the facts from the CONTEXT when answering the QUESTION.
48
+
49
+ QUESTION: {question}
50
+
51
+ CONTEXT:
52
+ {context}
53
+ """.strip()
54
+
55
+ def build_prompt(query, search_results):
56
+ context = "\n\n".join([f"Segment {i+1}: {result['content']}" for i, result in enumerate(search_results)])
57
+ prompt = prompt_template.format(question=query, context=context).strip()
58
+ return prompt
59
+
60
+ def llm(prompt):
61
+ response = requests.post('http://localhost:11434/api/generate', json={
62
+ 'model': 'phi',
63
+ 'prompt': prompt
64
+ })
65
+ if response.status_code == 200:
66
+ return response.json()['response']
67
+ else:
68
+ print(f"Error: {response.status_code} - {response.text}")
69
+ return None
70
+
71
+ def rag(query):
72
+ search_results = search(query)
73
+ prompt = build_prompt(query, search_results)
74
+ answer = llm(prompt)
75
+ return answer
76
+
77
+ # Evaluation metrics
78
+ def hit_rate(relevance_total):
79
+ return sum(any(line) for line in relevance_total) / len(relevance_total)
80
+
81
+ def mrr(relevance_total):
82
+ scores = []
83
+ for line in relevance_total:
84
+ for rank, relevant in enumerate(line, 1):
85
+ if relevant:
86
+ scores.append(1 / rank)
87
+ break
88
+ else:
89
+ scores.append(0)
90
+ return sum(scores) / len(scores)
91
+
92
+ def evaluate(ground_truth, search_function):
93
+ relevance_total = []
94
+ for _, row in tqdm(ground_truth.iterrows(), total=len(ground_truth)):
95
+ video_id = row['video_id']
96
+ results = search_function(row['question'])
97
+ relevance = [d['video_id'] == video_id for d in results]
98
+ relevance_total.append(relevance)
99
+ return {
100
+ 'hit_rate': hit_rate(relevance_total),
101
+ 'mrr': mrr(relevance_total),
102
+ }
103
+
104
+ # Parameter optimization
105
+ param_ranges = {
106
+ 'content': (0.0, 3.0),
107
+ }
108
+
109
+ def simple_optimize(param_ranges, objective_function, n_iterations=10):
110
+ best_params = None
111
+ best_score = float('-inf')
112
+ for _ in range(n_iterations):
113
+ current_params = {param: np.random.uniform(min_val, max_val)
114
+ for param, (min_val, max_val) in param_ranges.items()}
115
+ current_score = objective_function(current_params)
116
+ if current_score > best_score:
117
+ best_score = current_score
118
+ best_params = current_params
119
+ return best_params, best_score
120
+
121
+ def objective(boost_params):
122
+ def search_function(q):
123
+ return search(q, boost_params)
124
+ results = evaluate(ground_truth, search_function)
125
+ return results['mrr']
126
+
127
+ # RAG evaluation
128
+ prompt2_template = """
129
+ You are an expert evaluator for a Youtube transcript assistant.
130
+ Your task is to analyze the relevance of the generated answer to the given question.
131
+ Based on the relevance of the generated answer, you will classify it
132
+ as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".
133
+
134
+ Here is the data for evaluation:
135
+
136
+ Question: {question}
137
+ Generated Answer: {answer_llm}
138
+
139
+ Please analyze the content and context of the generated answer in relation to the question
140
+ and provide your evaluation in parsable JSON without using code blocks:
141
+
142
+ {{
143
+ "Relevance": "NON_RELEVANT" | "PARTLY_RELEVANT" | "RELEVANT",
144
+ "Explanation": "[Provide a brief explanation for your evaluation]"
145
+ }}
146
+ """.strip()
147
+
148
+ def evaluate_rag(sample_size=200):
149
+ sample = ground_truth.sample(n=sample_size, random_state=1)
150
+ evaluations = []
151
+ for _, row in tqdm(sample.iterrows(), total=len(sample)):
152
+ question = row['question']
153
+ answer_llm = rag(question)
154
+ prompt = prompt2_template.format(question=question, answer_llm=answer_llm)
155
+ evaluation = llm(prompt)
156
+ evaluation = json.loads(evaluation)
157
+ evaluations.append((row['video_id'], question, answer_llm, evaluation['Relevance'], evaluation['Explanation']))
158
+ return evaluations
159
+
160
+ # Main execution
161
+ if __name__ == "__main__":
162
+ print("Evaluating search performance...")
163
+ search_performance = evaluate(ground_truth, lambda q: search(q['question']))
164
+ print(f"Search performance: {search_performance}")
165
+
166
+ print("\nOptimizing search parameters...")
167
+ best_params, best_score = simple_optimize(param_ranges, objective, n_iterations=20)
168
+ print(f"Best parameters: {best_params}")
169
+ print(f"Best score: {best_score}")
170
+
171
+ print("\nEvaluating RAG performance...")
172
+ rag_evaluations = evaluate_rag(sample_size=200)
173
+
174
+ # Store RAG evaluations in the database
175
+ cursor.execute('''
176
+ CREATE TABLE IF NOT EXISTS rag_evaluations (
177
+ video_id TEXT,
178
+ question TEXT,
179
+ answer TEXT,
180
+ relevance TEXT,
181
+ explanation TEXT
182
+ )
183
+ ''')
184
+ cursor.executemany('''
185
+ INSERT INTO rag_evaluations (video_id, question, answer, relevance, explanation)
186
+ VALUES (?, ?, ?, ?, ?)
187
+ ''', rag_evaluations)
188
+ conn.commit()
189
+
190
+ print("Evaluation complete. Results stored in the database.")
191
+
192
+ # Close the database connection
193
+ conn.close()
app/transcript_extractor.py ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from youtube_transcript_api import YouTubeTranscriptApi
2
+ from googleapiclient.discovery import build
3
+ from googleapiclient.errors import HttpError
4
+ import re
5
+ import os
6
+
7
+ # Replace with your actual API key
8
+ API_KEY = os.environ.get('YOUTUBE_API_KEY', 'YOUR_API_KEY_HERE')
9
+
10
+ youtube = build('youtube', 'v3', developerKey=API_KEY)
11
+
12
+ def extract_video_id(url):
13
+ video_id_match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url)
14
+ if video_id_match:
15
+ return video_id_match.group(1)
16
+ return None
17
+
18
+ def get_video_metadata(video_id):
19
+ try:
20
+ request = youtube.videos().list(
21
+ part="snippet,contentDetails,statistics",
22
+ id=video_id
23
+ )
24
+ response = request.execute()
25
+
26
+ if 'items' in response and len(response['items']) > 0:
27
+ video = response['items'][0]
28
+ snippet = video['snippet']
29
+ return {
30
+ 'title': snippet['title'],
31
+ 'author': snippet['channelTitle'],
32
+ 'upload_date': snippet['publishedAt'],
33
+ 'view_count': video['statistics']['viewCount'],
34
+ 'like_count': video['statistics'].get('likeCount', 'N/A'),
35
+ 'comment_count': video['statistics'].get('commentCount', 'N/A'),
36
+ 'duration': video['contentDetails']['duration']
37
+ }
38
+ else:
39
+ return None
40
+ except HttpError as e:
41
+ print(f"An HTTP error {e.resp.status} occurred: {e.content}")
42
+ return None
43
+
44
+ def get_transcript(video_id):
45
+ try:
46
+ transcript = YouTubeTranscriptApi.get_transcript(video_id)
47
+ metadata = get_video_metadata(video_id)
48
+ return {
49
+ 'transcript': transcript,
50
+ 'metadata': metadata
51
+ }
52
+ except Exception as e:
53
+ print(f"Error extracting transcript for video {video_id}: {str(e)}")
54
+ return None
55
+
56
+ def get_channel_videos(channel_id):
57
+ try:
58
+ request = youtube.search().list(
59
+ part="id,snippet",
60
+ channelId=channel_id,
61
+ type="video",
62
+ maxResults=50 # Adjust as needed
63
+ )
64
+ response = request.execute()
65
+
66
+ videos = []
67
+ for item in response['items']:
68
+ videos.append({
69
+ 'video_id': item['id']['videoId'],
70
+ 'title': item['snippet']['title'],
71
+ 'description': item['snippet']['description'],
72
+ 'published_at': item['snippet']['publishedAt']
73
+ })
74
+ return videos
75
+ except HttpError as e:
76
+ print(f"An HTTP error {e.resp.status} occurred: {e.content}")
77
+ return []
78
+
79
+ def process_videos(video_ids):
80
+ transcripts = {}
81
+ for video_id in video_ids:
82
+ transcript_data = get_transcript(video_id)
83
+ if transcript_data:
84
+ transcripts[video_id] = transcript_data
85
+ return transcripts
config/config.yaml ADDED
File without changes
data/sqlite.db ADDED
Binary file (32.8 kB). View file
 
docker-compose.yaml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ app:
5
+ build: .
6
+ ports:
7
+ - "8501:8501"
8
+ depends_on:
9
+ - elasticsearch
10
+ environment:
11
+ - ELASTICSEARCH_HOST=elasticsearch
12
+ - ELASTICSEARCH_PORT=9200
13
+ - YOUTUBE_API_KEY=${YOUTUBE_API_KEY}
14
+ env_file:
15
+ - .env
16
+ volumes:
17
+ - ./data:/app/data
18
+ - ./config:/app/config
19
+
20
+ elasticsearch:
21
+ image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
22
+ environment:
23
+ - discovery.type=single-node
24
+ - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
25
+ ports:
26
+ - "9200:9200"
27
+ volumes:
28
+ - esdata:/usr/share/elasticsearch/data
29
+
30
+ grafana:
31
+ image: grafana/grafana:latest
32
+ ports:
33
+ - "3000:3000"
34
+ volumes:
35
+ - grafana-storage:/var/lib/grafana
36
+ - ./config/grafana:/etc/grafana/provisioning
37
+ depends_on:
38
+ - elasticsearch
39
+
40
+ volumes:
41
+ esdata:
42
+ grafana-storage:
grafana/provisioning/dashboards/rag_evaluation.json ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "annotations": {
3
+ "list": [
4
+ {
5
+ "builtIn": 1,
6
+ "datasource": "-- Grafana --",
7
+ "enable": true,
8
+ "hide": true,
9
+ "iconColor": "rgba(0, 211, 255, 1)",
10
+ "name": "Annotations & Alerts",
11
+ "type": "dashboard"
12
+ }
13
+ ]
14
+ },
15
+ "editable": true,
16
+ "gnetId": null,
17
+ "graphTooltip": 0,
18
+ "id": 1,
19
+ "links": [],
20
+ "panels": [
21
+ {
22
+ "aliasColors": {},
23
+ "bars": false,
24
+ "dashLength": 10,
25
+ "dashes": false,
26
+ "datasource": "SQLite",
27
+ "fieldConfig": {
28
+ "defaults": {},
29
+ "overrides": []
30
+ },
31
+ "fill": 1,
32
+ "fillGradient": 0,
33
+ "gridPos": {
34
+ "h": 9,
35
+ "w": 12,
36
+ "x": 0,
37
+ "y": 0
38
+ },
39
+ "hiddenSeries": false,
40
+ "id": 2,
41
+ "legend": {
42
+ "avg": false,
43
+ "current": false,
44
+ "max": false,
45
+ "min": false,
46
+ "show": true,
47
+ "total": false,
48
+ "values": false
49
+ },
50
+ "lines": true,
51
+ "linewidth": 1,
52
+ "nullPointMode": "null",
53
+ "options": {
54
+ "alertThreshold": true
55
+ },
56
+ "percentage": false,
57
+ "pluginVersion": "7.5.7",
58
+ "pointradius": 2,
59
+ "points": false,
60
+ "renderer": "flot",
61
+ "seriesOverrides": [],
62
+ "spaceLength": 10,
63
+ "stack": false,
64
+ "steppedLine": false,
65
+ "targets": [
66
+ {
67
+ "queryType": "table",
68
+ "refId": "A",
69
+ "sql": "SELECT relevance, COUNT(*) as count FROM rag_evaluations GROUP BY relevance"
70
+ }
71
+ ],
72
+ "thresholds": [],
73
+ "timeFrom": null,
74
+ "timeRegions": [],
75
+ "timeShift": null,
76
+ "title": "RAG Evaluation Results",
77
+ "tooltip": {
78
+ "shared": true,
79
+ "sort": 0,
80
+ "value_type": "individual"
81
+ },
82
+ "type": "graph",
83
+ "xaxis": {
84
+ "buckets": null,
85
+ "mode": "categories",
86
+ "name": null,
87
+ "show": true,
88
+ "values": []
89
+ },
90
+ "yaxes": [
91
+ {
92
+ "format": "short",
93
+ "label": null,
94
+ "logBase": 1,
95
+ "max": null,
96
+ "min": null,
97
+ "show": true
98
+ },
99
+ {
100
+ "format": "short",
101
+ "label": null,
102
+ "logBase": 1,
103
+ "max": null,
104
+ "min": null,
105
+ "show": true
106
+ }
107
+ ],
108
+ "yaxis": {
109
+ "align": false,
110
+ "alignLevel": null
111
+ }
112
+ }
113
+ ],
114
+ "schemaVersion": 27,
115
+ "style": "dark",
116
+ "tags": [],
117
+ "templating": {
118
+ "list": []
119
+ },
120
+ "time": {
121
+ "from": "now-6h",
122
+ "to": "now"
123
+ },
124
+ "timepicker": {},
125
+ "timezone": "",
126
+ "title": "RAG Evaluation Dashboard",
127
+ "uid": "rag_evaluation",
128
+ "version": 1
129
+ }
grafana/provisioning/datasources/sqlite.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ apiVersion: 1
2
+
3
+ datasources:
4
+ - name: SQLite
5
+ type: sqlite
6
+ url: /app/data/sqlite.db
7
+ isDefault: true
llmrag/Scripts/Activate.ps1 ADDED
@@ -0,0 +1,472 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <#
2
+ .Synopsis
3
+ Activate a Python virtual environment for the current PowerShell session.
4
+
5
+ .Description
6
+ Pushes the python executable for a virtual environment to the front of the
7
+ $Env:PATH environment variable and sets the prompt to signify that you are
8
+ in a Python virtual environment. Makes use of the command line switches as
9
+ well as the `pyvenv.cfg` file values present in the virtual environment.
10
+
11
+ .Parameter VenvDir
12
+ Path to the directory that contains the virtual environment to activate. The
13
+ default value for this is the parent of the directory that the Activate.ps1
14
+ script is located within.
15
+
16
+ .Parameter Prompt
17
+ The prompt prefix to display when this virtual environment is activated. By
18
+ default, this prompt is the name of the virtual environment folder (VenvDir)
19
+ surrounded by parentheses and followed by a single space (ie. '(.venv) ').
20
+
21
+ .Example
22
+ Activate.ps1
23
+ Activates the Python virtual environment that contains the Activate.ps1 script.
24
+
25
+ .Example
26
+ Activate.ps1 -Verbose
27
+ Activates the Python virtual environment that contains the Activate.ps1 script,
28
+ and shows extra information about the activation as it executes.
29
+
30
+ .Example
31
+ Activate.ps1 -VenvDir C:\Users\MyUser\Common\.venv
32
+ Activates the Python virtual environment located in the specified location.
33
+
34
+ .Example
35
+ Activate.ps1 -Prompt "MyPython"
36
+ Activates the Python virtual environment that contains the Activate.ps1 script,
37
+ and prefixes the current prompt with the specified string (surrounded in
38
+ parentheses) while the virtual environment is active.
39
+
40
+ .Notes
41
+ On Windows, it may be required to enable this Activate.ps1 script by setting the
42
+ execution policy for the user. You can do this by issuing the following PowerShell
43
+ command:
44
+
45
+ PS C:\> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
46
+
47
+ For more information on Execution Policies:
48
+ https://go.microsoft.com/fwlink/?LinkID=135170
49
+
50
+ #>
51
+ Param(
52
+ [Parameter(Mandatory = $false)]
53
+ [String]
54
+ $VenvDir,
55
+ [Parameter(Mandatory = $false)]
56
+ [String]
57
+ $Prompt
58
+ )
59
+
60
+ <# Function declarations --------------------------------------------------- #>
61
+
62
+ <#
63
+ .Synopsis
64
+ Remove all shell session elements added by the Activate script, including the
65
+ addition of the virtual environment's Python executable from the beginning of
66
+ the PATH variable.
67
+
68
+ .Parameter NonDestructive
69
+ If present, do not remove this function from the global namespace for the
70
+ session.
71
+
72
+ #>
73
+ function global:deactivate ([switch]$NonDestructive) {
74
+ # Revert to original values
75
+
76
+ # The prior prompt:
77
+ if (Test-Path -Path Function:_OLD_VIRTUAL_PROMPT) {
78
+ Copy-Item -Path Function:_OLD_VIRTUAL_PROMPT -Destination Function:prompt
79
+ Remove-Item -Path Function:_OLD_VIRTUAL_PROMPT
80
+ }
81
+
82
+ # The prior PYTHONHOME:
83
+ if (Test-Path -Path Env:_OLD_VIRTUAL_PYTHONHOME) {
84
+ Copy-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME -Destination Env:PYTHONHOME
85
+ Remove-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME
86
+ }
87
+
88
+ # The prior PATH:
89
+ if (Test-Path -Path Env:_OLD_VIRTUAL_PATH) {
90
+ Copy-Item -Path Env:_OLD_VIRTUAL_PATH -Destination Env:PATH
91
+ Remove-Item -Path Env:_OLD_VIRTUAL_PATH
92
+ }
93
+
94
+ # Just remove the VIRTUAL_ENV altogether:
95
+ if (Test-Path -Path Env:VIRTUAL_ENV) {
96
+ Remove-Item -Path env:VIRTUAL_ENV
97
+ }
98
+
99
+ # Just remove VIRTUAL_ENV_PROMPT altogether.
100
+ if (Test-Path -Path Env:VIRTUAL_ENV_PROMPT) {
101
+ Remove-Item -Path env:VIRTUAL_ENV_PROMPT
102
+ }
103
+
104
+ # Just remove the _PYTHON_VENV_PROMPT_PREFIX altogether:
105
+ if (Get-Variable -Name "_PYTHON_VENV_PROMPT_PREFIX" -ErrorAction SilentlyContinue) {
106
+ Remove-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Scope Global -Force
107
+ }
108
+
109
+ # Leave deactivate function in the global namespace if requested:
110
+ if (-not $NonDestructive) {
111
+ Remove-Item -Path function:deactivate
112
+ }
113
+ }
114
+
115
+ <#
116
+ .Description
117
+ Get-PyVenvConfig parses the values from the pyvenv.cfg file located in the
118
+ given folder, and returns them in a map.
119
+
120
+ For each line in the pyvenv.cfg file, if that line can be parsed into exactly
121
+ two strings separated by `=` (with any amount of whitespace surrounding the =)
122
+ then it is considered a `key = value` line. The left hand string is the key,
123
+ the right hand is the value.
124
+
125
+ If the value starts with a `'` or a `"` then the first and last character is
126
+ stripped from the value before being captured.
127
+
128
+ .Parameter ConfigDir
129
+ Path to the directory that contains the `pyvenv.cfg` file.
130
+ #>
131
+ function Get-PyVenvConfig(
132
+ [String]
133
+ $ConfigDir
134
+ ) {
135
+ Write-Verbose "Given ConfigDir=$ConfigDir, obtain values in pyvenv.cfg"
136
+
137
+ # Ensure the file exists, and issue a warning if it doesn't (but still allow the function to continue).
138
+ $pyvenvConfigPath = Join-Path -Resolve -Path $ConfigDir -ChildPath 'pyvenv.cfg' -ErrorAction Continue
139
+
140
+ # An empty map will be returned if no config file is found.
141
+ $pyvenvConfig = @{ }
142
+
143
+ if ($pyvenvConfigPath) {
144
+
145
+ Write-Verbose "File exists, parse `key = value` lines"
146
+ $pyvenvConfigContent = Get-Content -Path $pyvenvConfigPath
147
+
148
+ $pyvenvConfigContent | ForEach-Object {
149
+ $keyval = $PSItem -split "\s*=\s*", 2
150
+ if ($keyval[0] -and $keyval[1]) {
151
+ $val = $keyval[1]
152
+
153
+ # Remove extraneous quotations around a string value.
154
+ if ("'""".Contains($val.Substring(0, 1))) {
155
+ $val = $val.Substring(1, $val.Length - 2)
156
+ }
157
+
158
+ $pyvenvConfig[$keyval[0]] = $val
159
+ Write-Verbose "Adding Key: '$($keyval[0])'='$val'"
160
+ }
161
+ }
162
+ }
163
+ return $pyvenvConfig
164
+ }
165
+
166
+
167
+ <# Begin Activate script --------------------------------------------------- #>
168
+
169
+ # Determine the containing directory of this script
170
+ $VenvExecPath = Split-Path -Parent $MyInvocation.MyCommand.Definition
171
+ $VenvExecDir = Get-Item -Path $VenvExecPath
172
+
173
+ Write-Verbose "Activation script is located in path: '$VenvExecPath'"
174
+ Write-Verbose "VenvExecDir Fullname: '$($VenvExecDir.FullName)"
175
+ Write-Verbose "VenvExecDir Name: '$($VenvExecDir.Name)"
176
+
177
+ # Set values required in priority: CmdLine, ConfigFile, Default
178
+ # First, get the location of the virtual environment, it might not be
179
+ # VenvExecDir if specified on the command line.
180
+ if ($VenvDir) {
181
+ Write-Verbose "VenvDir given as parameter, using '$VenvDir' to determine values"
182
+ }
183
+ else {
184
+ Write-Verbose "VenvDir not given as a parameter, using parent directory name as VenvDir."
185
+ $VenvDir = $VenvExecDir.Parent.FullName.TrimEnd("\\/")
186
+ Write-Verbose "VenvDir=$VenvDir"
187
+ }
188
+
189
+ # Next, read the `pyvenv.cfg` file to determine any required value such
190
+ # as `prompt`.
191
+ $pyvenvCfg = Get-PyVenvConfig -ConfigDir $VenvDir
192
+
193
+ # Next, set the prompt from the command line, or the config file, or
194
+ # just use the name of the virtual environment folder.
195
+ if ($Prompt) {
196
+ Write-Verbose "Prompt specified as argument, using '$Prompt'"
197
+ }
198
+ else {
199
+ Write-Verbose "Prompt not specified as argument to script, checking pyvenv.cfg value"
200
+ if ($pyvenvCfg -and $pyvenvCfg['prompt']) {
201
+ Write-Verbose " Setting based on value in pyvenv.cfg='$($pyvenvCfg['prompt'])'"
202
+ $Prompt = $pyvenvCfg['prompt'];
203
+ }
204
+ else {
205
+ Write-Verbose " Setting prompt based on parent's directory's name. (Is the directory name passed to venv module when creating the virtual environment)"
206
+ Write-Verbose " Got leaf-name of $VenvDir='$(Split-Path -Path $venvDir -Leaf)'"
207
+ $Prompt = Split-Path -Path $venvDir -Leaf
208
+ }
209
+ }
210
+
211
+ Write-Verbose "Prompt = '$Prompt'"
212
+ Write-Verbose "VenvDir='$VenvDir'"
213
+
214
+ # Deactivate any currently active virtual environment, but leave the
215
+ # deactivate function in place.
216
+ deactivate -nondestructive
217
+
218
+ # Now set the environment variable VIRTUAL_ENV, used by many tools to determine
219
+ # that there is an activated venv.
220
+ $env:VIRTUAL_ENV = $VenvDir
221
+
222
+ if (-not $Env:VIRTUAL_ENV_DISABLE_PROMPT) {
223
+
224
+ Write-Verbose "Setting prompt to '$Prompt'"
225
+
226
+ # Set the prompt to include the env name
227
+ # Make sure _OLD_VIRTUAL_PROMPT is global
228
+ function global:_OLD_VIRTUAL_PROMPT { "" }
229
+ Copy-Item -Path function:prompt -Destination function:_OLD_VIRTUAL_PROMPT
230
+ New-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Description "Python virtual environment prompt prefix" -Scope Global -Option ReadOnly -Visibility Public -Value $Prompt
231
+
232
+ function global:prompt {
233
+ Write-Host -NoNewline -ForegroundColor Green "($_PYTHON_VENV_PROMPT_PREFIX) "
234
+ _OLD_VIRTUAL_PROMPT
235
+ }
236
+ $env:VIRTUAL_ENV_PROMPT = $Prompt
237
+ }
238
+
239
+ # Clear PYTHONHOME
240
+ if (Test-Path -Path Env:PYTHONHOME) {
241
+ Copy-Item -Path Env:PYTHONHOME -Destination Env:_OLD_VIRTUAL_PYTHONHOME
242
+ Remove-Item -Path Env:PYTHONHOME
243
+ }
244
+
245
+ # Add the venv to the PATH
246
+ Copy-Item -Path Env:PATH -Destination Env:_OLD_VIRTUAL_PATH
247
+ $Env:PATH = "$VenvExecDir$([System.IO.Path]::PathSeparator)$Env:PATH"
248
+
249
+ # SIG # Begin signature block
250
+ # MIIpigYJKoZIhvcNAQcCoIIpezCCKXcCAQExDzANBglghkgBZQMEAgEFADB5Bgor
251
+ # BgEEAYI3AgEEoGswaTA0BgorBgEEAYI3AgEeMCYCAwEAAAQQH8w7YFlLCE63JNLG
252
+ # KX7zUQIBAAIBAAIBAAIBAAIBADAxMA0GCWCGSAFlAwQCAQUABCBnL745ElCYk8vk
253
+ # dBtMuQhLeWJ3ZGfzKW4DHCYzAn+QB6CCDi8wggawMIIEmKADAgECAhAIrUCyYNKc
254
+ # TJ9ezam9k67ZMA0GCSqGSIb3DQEBDAUAMGIxCzAJBgNVBAYTAlVTMRUwEwYDVQQK
255
+ # EwxEaWdpQ2VydCBJbmMxGTAXBgNVBAsTEHd3dy5kaWdpY2VydC5jb20xITAfBgNV
256
+ # BAMTGERpZ2lDZXJ0IFRydXN0ZWQgUm9vdCBHNDAeFw0yMTA0MjkwMDAwMDBaFw0z
257
+ # NjA0MjgyMzU5NTlaMGkxCzAJBgNVBAYTAlVTMRcwFQYDVQQKEw5EaWdpQ2VydCwg
258
+ # SW5jLjFBMD8GA1UEAxM4RGlnaUNlcnQgVHJ1c3RlZCBHNCBDb2RlIFNpZ25pbmcg
259
+ # UlNBNDA5NiBTSEEzODQgMjAyMSBDQTEwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAw
260
+ # ggIKAoICAQDVtC9C0CiteLdd1TlZG7GIQvUzjOs9gZdwxbvEhSYwn6SOaNhc9es0
261
+ # JAfhS0/TeEP0F9ce2vnS1WcaUk8OoVf8iJnBkcyBAz5NcCRks43iCH00fUyAVxJr
262
+ # Q5qZ8sU7H/Lvy0daE6ZMswEgJfMQ04uy+wjwiuCdCcBlp/qYgEk1hz1RGeiQIXhF
263
+ # LqGfLOEYwhrMxe6TSXBCMo/7xuoc82VokaJNTIIRSFJo3hC9FFdd6BgTZcV/sk+F
264
+ # LEikVoQ11vkunKoAFdE3/hoGlMJ8yOobMubKwvSnowMOdKWvObarYBLj6Na59zHh
265
+ # 3K3kGKDYwSNHR7OhD26jq22YBoMbt2pnLdK9RBqSEIGPsDsJ18ebMlrC/2pgVItJ
266
+ # wZPt4bRc4G/rJvmM1bL5OBDm6s6R9b7T+2+TYTRcvJNFKIM2KmYoX7BzzosmJQay
267
+ # g9Rc9hUZTO1i4F4z8ujo7AqnsAMrkbI2eb73rQgedaZlzLvjSFDzd5Ea/ttQokbI
268
+ # YViY9XwCFjyDKK05huzUtw1T0PhH5nUwjewwk3YUpltLXXRhTT8SkXbev1jLchAp
269
+ # QfDVxW0mdmgRQRNYmtwmKwH0iU1Z23jPgUo+QEdfyYFQc4UQIyFZYIpkVMHMIRro
270
+ # OBl8ZhzNeDhFMJlP/2NPTLuqDQhTQXxYPUez+rbsjDIJAsxsPAxWEQIDAQABo4IB
271
+ # WTCCAVUwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQUaDfg67Y7+F8Rhvv+
272
+ # YXsIiGX0TkIwHwYDVR0jBBgwFoAU7NfjgtJxXWRM3y5nP+e6mK4cD08wDgYDVR0P
273
+ # AQH/BAQDAgGGMBMGA1UdJQQMMAoGCCsGAQUFBwMDMHcGCCsGAQUFBwEBBGswaTAk
274
+ # BggrBgEFBQcwAYYYaHR0cDovL29jc3AuZGlnaWNlcnQuY29tMEEGCCsGAQUFBzAC
275
+ # hjVodHRwOi8vY2FjZXJ0cy5kaWdpY2VydC5jb20vRGlnaUNlcnRUcnVzdGVkUm9v
276
+ # dEc0LmNydDBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsMy5kaWdpY2VydC5j
277
+ # b20vRGlnaUNlcnRUcnVzdGVkUm9vdEc0LmNybDAcBgNVHSAEFTATMAcGBWeBDAED
278
+ # MAgGBmeBDAEEATANBgkqhkiG9w0BAQwFAAOCAgEAOiNEPY0Idu6PvDqZ01bgAhql
279
+ # +Eg08yy25nRm95RysQDKr2wwJxMSnpBEn0v9nqN8JtU3vDpdSG2V1T9J9Ce7FoFF
280
+ # UP2cvbaF4HZ+N3HLIvdaqpDP9ZNq4+sg0dVQeYiaiorBtr2hSBh+3NiAGhEZGM1h
281
+ # mYFW9snjdufE5BtfQ/g+lP92OT2e1JnPSt0o618moZVYSNUa/tcnP/2Q0XaG3Ryw
282
+ # YFzzDaju4ImhvTnhOE7abrs2nfvlIVNaw8rpavGiPttDuDPITzgUkpn13c5Ubdld
283
+ # AhQfQDN8A+KVssIhdXNSy0bYxDQcoqVLjc1vdjcshT8azibpGL6QB7BDf5WIIIJw
284
+ # 8MzK7/0pNVwfiThV9zeKiwmhywvpMRr/LhlcOXHhvpynCgbWJme3kuZOX956rEnP
285
+ # LqR0kq3bPKSchh/jwVYbKyP/j7XqiHtwa+aguv06P0WmxOgWkVKLQcBIhEuWTatE
286
+ # QOON8BUozu3xGFYHKi8QxAwIZDwzj64ojDzLj4gLDb879M4ee47vtevLt/B3E+bn
287
+ # KD+sEq6lLyJsQfmCXBVmzGwOysWGw/YmMwwHS6DTBwJqakAwSEs0qFEgu60bhQji
288
+ # WQ1tygVQK+pKHJ6l/aCnHwZ05/LWUpD9r4VIIflXO7ScA+2GRfS0YW6/aOImYIbq
289
+ # yK+p/pQd52MbOoZWeE4wggd3MIIFX6ADAgECAhAHHxQbizANJfMU6yMM0NHdMA0G
290
+ # CSqGSIb3DQEBCwUAMGkxCzAJBgNVBAYTAlVTMRcwFQYDVQQKEw5EaWdpQ2VydCwg
291
+ # SW5jLjFBMD8GA1UEAxM4RGlnaUNlcnQgVHJ1c3RlZCBHNCBDb2RlIFNpZ25pbmcg
292
+ # UlNBNDA5NiBTSEEzODQgMjAyMSBDQTEwHhcNMjIwMTE3MDAwMDAwWhcNMjUwMTE1
293
+ # MjM1OTU5WjB8MQswCQYDVQQGEwJVUzEPMA0GA1UECBMGT3JlZ29uMRIwEAYDVQQH
294
+ # EwlCZWF2ZXJ0b24xIzAhBgNVBAoTGlB5dGhvbiBTb2Z0d2FyZSBGb3VuZGF0aW9u
295
+ # MSMwIQYDVQQDExpQeXRob24gU29mdHdhcmUgRm91bmRhdGlvbjCCAiIwDQYJKoZI
296
+ # hvcNAQEBBQADggIPADCCAgoCggIBAKgc0BTT+iKbtK6f2mr9pNMUTcAJxKdsuOiS
297
+ # YgDFfwhjQy89koM7uP+QV/gwx8MzEt3c9tLJvDccVWQ8H7mVsk/K+X+IufBLCgUi
298
+ # 0GGAZUegEAeRlSXxxhYScr818ma8EvGIZdiSOhqjYc4KnfgfIS4RLtZSrDFG2tN1
299
+ # 6yS8skFa3IHyvWdbD9PvZ4iYNAS4pjYDRjT/9uzPZ4Pan+53xZIcDgjiTwOh8VGu
300
+ # ppxcia6a7xCyKoOAGjvCyQsj5223v1/Ig7Dp9mGI+nh1E3IwmyTIIuVHyK6Lqu35
301
+ # 2diDY+iCMpk9ZanmSjmB+GMVs+H/gOiofjjtf6oz0ki3rb7sQ8fTnonIL9dyGTJ0
302
+ # ZFYKeb6BLA66d2GALwxZhLe5WH4Np9HcyXHACkppsE6ynYjTOd7+jN1PRJahN1oE
303
+ # RzTzEiV6nCO1M3U1HbPTGyq52IMFSBM2/07WTJSbOeXjvYR7aUxK9/ZkJiacl2iZ
304
+ # I7IWe7JKhHohqKuceQNyOzxTakLcRkzynvIrk33R9YVqtB4L6wtFxhUjvDnQg16x
305
+ # ot2KVPdfyPAWd81wtZADmrUtsZ9qG79x1hBdyOl4vUtVPECuyhCxaw+faVjumapP
306
+ # Unwo8ygflJJ74J+BYxf6UuD7m8yzsfXWkdv52DjL74TxzuFTLHPyARWCSCAbzn3Z
307
+ # Ily+qIqDAgMBAAGjggIGMIICAjAfBgNVHSMEGDAWgBRoN+Drtjv4XxGG+/5hewiI
308
+ # ZfROQjAdBgNVHQ4EFgQUt/1Teh2XDuUj2WW3siYWJgkZHA8wDgYDVR0PAQH/BAQD
309
+ # AgeAMBMGA1UdJQQMMAoGCCsGAQUFBwMDMIG1BgNVHR8Ega0wgaowU6BRoE+GTWh0
310
+ # dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNENvZGVTaWdu
311
+ # aW5nUlNBNDA5NlNIQTM4NDIwMjFDQTEuY3JsMFOgUaBPhk1odHRwOi8vY3JsNC5k
312
+ # aWdpY2VydC5jb20vRGlnaUNlcnRUcnVzdGVkRzRDb2RlU2lnbmluZ1JTQTQwOTZT
313
+ # SEEzODQyMDIxQ0ExLmNybDA+BgNVHSAENzA1MDMGBmeBDAEEATApMCcGCCsGAQUF
314
+ # BwIBFhtodHRwOi8vd3d3LmRpZ2ljZXJ0LmNvbS9DUFMwgZQGCCsGAQUFBwEBBIGH
315
+ # MIGEMCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wXAYIKwYB
316
+ # BQUHMAKGUGh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0
317
+ # ZWRHNENvZGVTaWduaW5nUlNBNDA5NlNIQTM4NDIwMjFDQTEuY3J0MAwGA1UdEwEB
318
+ # /wQCMAAwDQYJKoZIhvcNAQELBQADggIBABxv4AeV/5ltkELHSC63fXAFYS5tadcW
319
+ # TiNc2rskrNLrfH1Ns0vgSZFoQxYBFKI159E8oQQ1SKbTEubZ/B9kmHPhprHya08+
320
+ # VVzxC88pOEvz68nA82oEM09584aILqYmj8Pj7h/kmZNzuEL7WiwFa/U1hX+XiWfL
321
+ # IJQsAHBla0i7QRF2de8/VSF0XXFa2kBQ6aiTsiLyKPNbaNtbcucaUdn6vVUS5izW
322
+ # OXM95BSkFSKdE45Oq3FForNJXjBvSCpwcP36WklaHL+aHu1upIhCTUkzTHMh8b86
323
+ # WmjRUqbrnvdyR2ydI5l1OqcMBjkpPpIV6wcc+KY/RH2xvVuuoHjlUjwq2bHiNoX+
324
+ # W1scCpnA8YTs2d50jDHUgwUo+ciwpffH0Riq132NFmrH3r67VaN3TuBxjI8SIZM5
325
+ # 8WEDkbeoriDk3hxU8ZWV7b8AW6oyVBGfM06UgkfMb58h+tJPrFx8VI/WLq1dTqMf
326
+ # ZOm5cuclMnUHs2uqrRNtnV8UfidPBL4ZHkTcClQbCoz0UbLhkiDvIS00Dn+BBcxw
327
+ # /TKqVL4Oaz3bkMSsM46LciTeucHY9ExRVt3zy7i149sd+F4QozPqn7FrSVHXmem3
328
+ # r7bjyHTxOgqxRCVa18Vtx7P/8bYSBeS+WHCKcliFCecspusCDSlnRUjZwyPdP0VH
329
+ # xaZg2unjHY3rMYIasTCCGq0CAQEwfTBpMQswCQYDVQQGEwJVUzEXMBUGA1UEChMO
330
+ # RGlnaUNlcnQsIEluYy4xQTA/BgNVBAMTOERpZ2lDZXJ0IFRydXN0ZWQgRzQgQ29k
331
+ # ZSBTaWduaW5nIFJTQTQwOTYgU0hBMzg0IDIwMjEgQ0ExAhAHHxQbizANJfMU6yMM
332
+ # 0NHdMA0GCWCGSAFlAwQCAQUAoIHEMBkGCSqGSIb3DQEJAzEMBgorBgEEAYI3AgEE
333
+ # MBwGCisGAQQBgjcCAQsxDjAMBgorBgEEAYI3AgEVMC8GCSqGSIb3DQEJBDEiBCBn
334
+ # AZ6P7YvTwq0fbF62o7E75R0LxsW5OtyYiFESQckLhjBYBgorBgEEAYI3AgEMMUow
335
+ # SKBGgEQAQgB1AGkAbAB0ADoAIABSAGUAbABlAGEAcwBlAF8AdgAzAC4AMQAxAC4A
336
+ # MABfADIAMAAyADIAMQAwADIANAAuADAAMTANBgkqhkiG9w0BAQEFAASCAgAu2uG5
337
+ # zPAAKY4N8BVMzMPRSoTqq2HAcX+oqvto72DGzHLKlfAuuyf59saf7TQZQ04Ao1ni
338
+ # EvpzZ8C4Wv7yu8RyPwJQThIuFQuhMgB+Zscl+YDnAo5+GFTBpevgcG2n2ClHAPuT
339
+ # 7aXe3+5wChDpMqyusrBYws+8R6tg8rKFyRhQndxIJkIMlZhoh1qI3tRypW6e2r5l
340
+ # Uf4pPDkNBBySzjNOupTyv1/d2Y31Ise8xLrLbuMLYxtir/5A0z6GlUueoecpe9TS
341
+ # uEqz2bI+HZbGC6xK2BU4vW8s7qefVTmPFAf3JiCjZZ46qFAg9jnWCRzAA/3jOtu6
342
+ # V345rFhCRJxPKz4M96B5mUCnMU0BB4cHJFKZfezd5phtExi1///WcnKNkpNTto+d
343
+ # etpWbJ87DibBro3ZhDPh9FpHW2jxy2IQBZo02Udbwfd7aoKhRf7MCLqZUIziPjRS
344
+ # FcA1hyOzYk4XfHK1qW3Wpflduz86UGDbURWP3XhXQNaSScJGOhVylZbiBWcjFKlD
345
+ # E/sl+bDyafUy0jLur6/Vl4H2xCgXbJlEazr04QfizW9N9x2G6sDkdbQd4k3kSEJt
346
+ # UOufbrdjDY1MRd/NlnjVGY+zslEDN9QJQuKq00SJagicDJ+vIzg6J7YjnRfDGLAi
347
+ # RJb9rXxuQyEoSTdtxQgnPNkb6vCNQz80bjHmoqGCFz4wghc6BgorBgEEAYI3AwMB
348
+ # MYIXKjCCFyYGCSqGSIb3DQEHAqCCFxcwghcTAgEDMQ8wDQYJYIZIAWUDBAIBBQAw
349
+ # eAYLKoZIhvcNAQkQAQSgaQRnMGUCAQEGCWCGSAGG/WwHATAxMA0GCWCGSAFlAwQC
350
+ # AQUABCCJnxONky4RAgM+R4O2F+soqJ9cjrZDLL3JqXN+msPWngIRAPgphjs42egI
351
+ # Fn/RXf6+TgkYDzIwMjIxMDI0MTgzMzM4WqCCEwcwggbAMIIEqKADAgECAhAMTWly
352
+ # S5T6PCpKPSkHgD1aMA0GCSqGSIb3DQEBCwUAMGMxCzAJBgNVBAYTAlVTMRcwFQYD
353
+ # VQQKEw5EaWdpQ2VydCwgSW5jLjE7MDkGA1UEAxMyRGlnaUNlcnQgVHJ1c3RlZCBH
354
+ # NCBSU0E0MDk2IFNIQTI1NiBUaW1lU3RhbXBpbmcgQ0EwHhcNMjIwOTIxMDAwMDAw
355
+ # WhcNMzMxMTIxMjM1OTU5WjBGMQswCQYDVQQGEwJVUzERMA8GA1UEChMIRGlnaUNl
356
+ # cnQxJDAiBgNVBAMTG0RpZ2lDZXJ0IFRpbWVzdGFtcCAyMDIyIC0gMjCCAiIwDQYJ
357
+ # KoZIhvcNAQEBBQADggIPADCCAgoCggIBAM/spSY6xqnya7uNwQ2a26HoFIV0Mxom
358
+ # rNAcVR4eNm28klUMYfSdCXc9FZYIL2tkpP0GgxbXkZI4HDEClvtysZc6Va8z7GGK
359
+ # 6aYo25BjXL2JU+A6LYyHQq4mpOS7eHi5ehbhVsbAumRTuyoW51BIu4hpDIjG8b7g
360
+ # L307scpTjUCDHufLckkoHkyAHoVW54Xt8mG8qjoHffarbuVm3eJc9S/tjdRNlYRo
361
+ # 44DLannR0hCRRinrPibytIzNTLlmyLuqUDgN5YyUXRlav/V7QG5vFqianJVHhoV5
362
+ # PgxeZowaCiS+nKrSnLb3T254xCg/oxwPUAY3ugjZNaa1Htp4WB056PhMkRCWfk3h
363
+ # 3cKtpX74LRsf7CtGGKMZ9jn39cFPcS6JAxGiS7uYv/pP5Hs27wZE5FX/NurlfDHn
364
+ # 88JSxOYWe1p+pSVz28BqmSEtY+VZ9U0vkB8nt9KrFOU4ZodRCGv7U0M50GT6Vs/g
365
+ # 9ArmFG1keLuY/ZTDcyHzL8IuINeBrNPxB9ThvdldS24xlCmL5kGkZZTAWOXlLimQ
366
+ # prdhZPrZIGwYUWC6poEPCSVT8b876asHDmoHOWIZydaFfxPZjXnPYsXs4Xu5zGcT
367
+ # B5rBeO3GiMiwbjJ5xwtZg43G7vUsfHuOy2SJ8bHEuOdTXl9V0n0ZKVkDTvpd6kVz
368
+ # HIR+187i1Dp3AgMBAAGjggGLMIIBhzAOBgNVHQ8BAf8EBAMCB4AwDAYDVR0TAQH/
369
+ # BAIwADAWBgNVHSUBAf8EDDAKBggrBgEFBQcDCDAgBgNVHSAEGTAXMAgGBmeBDAEE
370
+ # AjALBglghkgBhv1sBwEwHwYDVR0jBBgwFoAUuhbZbU2FL3MpdpovdYxqII+eyG8w
371
+ # HQYDVR0OBBYEFGKK3tBh/I8xFO2XC809KpQU31KcMFoGA1UdHwRTMFEwT6BNoEuG
372
+ # SWh0dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNFJTQTQw
373
+ # OTZTSEEyNTZUaW1lU3RhbXBpbmdDQS5jcmwwgZAGCCsGAQUFBwEBBIGDMIGAMCQG
374
+ # CCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wWAYIKwYBBQUHMAKG
375
+ # TGh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNFJT
376
+ # QTQwOTZTSEEyNTZUaW1lU3RhbXBpbmdDQS5jcnQwDQYJKoZIhvcNAQELBQADggIB
377
+ # AFWqKhrzRvN4Vzcw/HXjT9aFI/H8+ZU5myXm93KKmMN31GT8Ffs2wklRLHiIY1UJ
378
+ # RjkA/GnUypsp+6M/wMkAmxMdsJiJ3HjyzXyFzVOdr2LiYWajFCpFh0qYQitQ/Bu1
379
+ # nggwCfrkLdcJiXn5CeaIzn0buGqim8FTYAnoo7id160fHLjsmEHw9g6A++T/350Q
380
+ # p+sAul9Kjxo6UrTqvwlJFTU2WZoPVNKyG39+XgmtdlSKdG3K0gVnK3br/5iyJpU4
381
+ # GYhEFOUKWaJr5yI+RCHSPxzAm+18SLLYkgyRTzxmlK9dAlPrnuKe5NMfhgFknADC
382
+ # 6Vp0dQ094XmIvxwBl8kZI4DXNlpflhaxYwzGRkA7zl011Fk+Q5oYrsPJy8P7mxNf
383
+ # arXH4PMFw1nfJ2Ir3kHJU7n/NBBn9iYymHv+XEKUgZSCnawKi8ZLFUrTmJBFYDOA
384
+ # 4CPe+AOk9kVH5c64A0JH6EE2cXet/aLol3ROLtoeHYxayB6a1cLwxiKoT5u92Bya
385
+ # UcQvmvZfpyeXupYuhVfAYOd4Vn9q78KVmksRAsiCnMkaBXy6cbVOepls9Oie1FqY
386
+ # yJ+/jbsYXEP10Cro4mLueATbvdH7WwqocH7wl4R44wgDXUcsY6glOJcB0j862uXl
387
+ # 9uab3H4szP8XTE0AotjWAQ64i+7m4HJViSwnGWH2dwGMMIIGrjCCBJagAwIBAgIQ
388
+ # BzY3tyRUfNhHrP0oZipeWzANBgkqhkiG9w0BAQsFADBiMQswCQYDVQQGEwJVUzEV
389
+ # MBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3d3cuZGlnaWNlcnQuY29t
390
+ # MSEwHwYDVQQDExhEaWdpQ2VydCBUcnVzdGVkIFJvb3QgRzQwHhcNMjIwMzIzMDAw
391
+ # MDAwWhcNMzcwMzIyMjM1OTU5WjBjMQswCQYDVQQGEwJVUzEXMBUGA1UEChMORGln
392
+ # aUNlcnQsIEluYy4xOzA5BgNVBAMTMkRpZ2lDZXJ0IFRydXN0ZWQgRzQgUlNBNDA5
393
+ # NiBTSEEyNTYgVGltZVN0YW1waW5nIENBMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
394
+ # MIICCgKCAgEAxoY1BkmzwT1ySVFVxyUDxPKRN6mXUaHW0oPRnkyibaCwzIP5WvYR
395
+ # oUQVQl+kiPNo+n3znIkLf50fng8zH1ATCyZzlm34V6gCff1DtITaEfFzsbPuK4CE
396
+ # iiIY3+vaPcQXf6sZKz5C3GeO6lE98NZW1OcoLevTsbV15x8GZY2UKdPZ7Gnf2ZCH
397
+ # RgB720RBidx8ald68Dd5n12sy+iEZLRS8nZH92GDGd1ftFQLIWhuNyG7QKxfst5K
398
+ # fc71ORJn7w6lY2zkpsUdzTYNXNXmG6jBZHRAp8ByxbpOH7G1WE15/tePc5OsLDni
399
+ # pUjW8LAxE6lXKZYnLvWHpo9OdhVVJnCYJn+gGkcgQ+NDY4B7dW4nJZCYOjgRs/b2
400
+ # nuY7W+yB3iIU2YIqx5K/oN7jPqJz+ucfWmyU8lKVEStYdEAoq3NDzt9KoRxrOMUp
401
+ # 88qqlnNCaJ+2RrOdOqPVA+C/8KI8ykLcGEh/FDTP0kyr75s9/g64ZCr6dSgkQe1C
402
+ # vwWcZklSUPRR8zZJTYsg0ixXNXkrqPNFYLwjjVj33GHek/45wPmyMKVM1+mYSlg+
403
+ # 0wOI/rOP015LdhJRk8mMDDtbiiKowSYI+RQQEgN9XyO7ZONj4KbhPvbCdLI/Hgl2
404
+ # 7KtdRnXiYKNYCQEoAA6EVO7O6V3IXjASvUaetdN2udIOa5kM0jO0zbECAwEAAaOC
405
+ # AV0wggFZMBIGA1UdEwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFLoW2W1NhS9zKXaa
406
+ # L3WMaiCPnshvMB8GA1UdIwQYMBaAFOzX44LScV1kTN8uZz/nupiuHA9PMA4GA1Ud
407
+ # DwEB/wQEAwIBhjATBgNVHSUEDDAKBggrBgEFBQcDCDB3BggrBgEFBQcBAQRrMGkw
408
+ # JAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLmRpZ2ljZXJ0LmNvbTBBBggrBgEFBQcw
409
+ # AoY1aHR0cDovL2NhY2VydHMuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0VHJ1c3RlZFJv
410
+ # b3RHNC5jcnQwQwYDVR0fBDwwOjA4oDagNIYyaHR0cDovL2NybDMuZGlnaWNlcnQu
411
+ # Y29tL0RpZ2lDZXJ0VHJ1c3RlZFJvb3RHNC5jcmwwIAYDVR0gBBkwFzAIBgZngQwB
412
+ # BAIwCwYJYIZIAYb9bAcBMA0GCSqGSIb3DQEBCwUAA4ICAQB9WY7Ak7ZvmKlEIgF+
413
+ # ZtbYIULhsBguEE0TzzBTzr8Y+8dQXeJLKftwig2qKWn8acHPHQfpPmDI2AvlXFvX
414
+ # bYf6hCAlNDFnzbYSlm/EUExiHQwIgqgWvalWzxVzjQEiJc6VaT9Hd/tydBTX/6tP
415
+ # iix6q4XNQ1/tYLaqT5Fmniye4Iqs5f2MvGQmh2ySvZ180HAKfO+ovHVPulr3qRCy
416
+ # Xen/KFSJ8NWKcXZl2szwcqMj+sAngkSumScbqyQeJsG33irr9p6xeZmBo1aGqwpF
417
+ # yd/EjaDnmPv7pp1yr8THwcFqcdnGE4AJxLafzYeHJLtPo0m5d2aR8XKc6UsCUqc3
418
+ # fpNTrDsdCEkPlM05et3/JWOZJyw9P2un8WbDQc1PtkCbISFA0LcTJM3cHXg65J6t
419
+ # 5TRxktcma+Q4c6umAU+9Pzt4rUyt+8SVe+0KXzM5h0F4ejjpnOHdI/0dKNPH+ejx
420
+ # mF/7K9h+8kaddSweJywm228Vex4Ziza4k9Tm8heZWcpw8De/mADfIBZPJ/tgZxah
421
+ # ZrrdVcA6KYawmKAr7ZVBtzrVFZgxtGIJDwq9gdkT/r+k0fNX2bwE+oLeMt8EifAA
422
+ # zV3C+dAjfwAL5HYCJtnwZXZCpimHCUcr5n8apIUP/JiW9lVUKx+A+sDyDivl1vup
423
+ # L0QVSucTDh3bNzgaoSv27dZ8/DCCBY0wggR1oAMCAQICEA6bGI750C3n79tQ4ghA
424
+ # GFowDQYJKoZIhvcNAQEMBQAwZTELMAkGA1UEBhMCVVMxFTATBgNVBAoTDERpZ2lD
425
+ # ZXJ0IEluYzEZMBcGA1UECxMQd3d3LmRpZ2ljZXJ0LmNvbTEkMCIGA1UEAxMbRGln
426
+ # aUNlcnQgQXNzdXJlZCBJRCBSb290IENBMB4XDTIyMDgwMTAwMDAwMFoXDTMxMTEw
427
+ # OTIzNTk1OVowYjELMAkGA1UEBhMCVVMxFTATBgNVBAoTDERpZ2lDZXJ0IEluYzEZ
428
+ # MBcGA1UECxMQd3d3LmRpZ2ljZXJ0LmNvbTEhMB8GA1UEAxMYRGlnaUNlcnQgVHJ1
429
+ # c3RlZCBSb290IEc0MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAv+aQ
430
+ # c2jeu+RdSjwwIjBpM+zCpyUuySE98orYWcLhKac9WKt2ms2uexuEDcQwH/MbpDgW
431
+ # 61bGl20dq7J58soR0uRf1gU8Ug9SH8aeFaV+vp+pVxZZVXKvaJNwwrK6dZlqczKU
432
+ # 0RBEEC7fgvMHhOZ0O21x4i0MG+4g1ckgHWMpLc7sXk7Ik/ghYZs06wXGXuxbGrzr
433
+ # yc/NrDRAX7F6Zu53yEioZldXn1RYjgwrt0+nMNlW7sp7XeOtyU9e5TXnMcvak17c
434
+ # jo+A2raRmECQecN4x7axxLVqGDgDEI3Y1DekLgV9iPWCPhCRcKtVgkEy19sEcypu
435
+ # kQF8IUzUvK4bA3VdeGbZOjFEmjNAvwjXWkmkwuapoGfdpCe8oU85tRFYF/ckXEaP
436
+ # ZPfBaYh2mHY9WV1CdoeJl2l6SPDgohIbZpp0yt5LHucOY67m1O+SkjqePdwA5EUl
437
+ # ibaaRBkrfsCUtNJhbesz2cXfSwQAzH0clcOP9yGyshG3u3/y1YxwLEFgqrFjGESV
438
+ # GnZifvaAsPvoZKYz0YkH4b235kOkGLimdwHhD5QMIR2yVCkliWzlDlJRR3S+Jqy2
439
+ # QXXeeqxfjT/JvNNBERJb5RBQ6zHFynIWIgnffEx1P2PsIV/EIFFrb7GrhotPwtZF
440
+ # X50g/KEexcCPorF+CiaZ9eRpL5gdLfXZqbId5RsCAwEAAaOCATowggE2MA8GA1Ud
441
+ # EwEB/wQFMAMBAf8wHQYDVR0OBBYEFOzX44LScV1kTN8uZz/nupiuHA9PMB8GA1Ud
442
+ # IwQYMBaAFEXroq/0ksuCMS1Ri6enIZ3zbcgPMA4GA1UdDwEB/wQEAwIBhjB5Bggr
443
+ # BgEFBQcBAQRtMGswJAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLmRpZ2ljZXJ0LmNv
444
+ # bTBDBggrBgEFBQcwAoY3aHR0cDovL2NhY2VydHMuZGlnaWNlcnQuY29tL0RpZ2lD
445
+ # ZXJ0QXNzdXJlZElEUm9vdENBLmNydDBFBgNVHR8EPjA8MDqgOKA2hjRodHRwOi8v
446
+ # Y3JsMy5kaWdpY2VydC5jb20vRGlnaUNlcnRBc3N1cmVkSURSb290Q0EuY3JsMBEG
447
+ # A1UdIAQKMAgwBgYEVR0gADANBgkqhkiG9w0BAQwFAAOCAQEAcKC/Q1xV5zhfoKN0
448
+ # Gz22Ftf3v1cHvZqsoYcs7IVeqRq7IviHGmlUIu2kiHdtvRoU9BNKei8ttzjv9P+A
449
+ # ufih9/Jy3iS8UgPITtAq3votVs/59PesMHqai7Je1M/RQ0SbQyHrlnKhSLSZy51P
450
+ # pwYDE3cnRNTnf+hZqPC/Lwum6fI0POz3A8eHqNJMQBk1RmppVLC4oVaO7KTVPeix
451
+ # 3P0c2PR3WlxUjG/voVA9/HYJaISfb8rbII01YBwCA8sgsKxYoA5AY8WYIsGyWfVV
452
+ # a88nq2x2zm8jLfR+cWojayL/ErhULSd+2DrZ8LaHlv1b0VysGMNNn3O3AamfV6pe
453
+ # KOK5lDGCA3YwggNyAgEBMHcwYzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDkRpZ2lD
454
+ # ZXJ0LCBJbmMuMTswOQYDVQQDEzJEaWdpQ2VydCBUcnVzdGVkIEc0IFJTQTQwOTYg
455
+ # U0hBMjU2IFRpbWVTdGFtcGluZyBDQQIQDE1pckuU+jwqSj0pB4A9WjANBglghkgB
456
+ # ZQMEAgEFAKCB0TAaBgkqhkiG9w0BCQMxDQYLKoZIhvcNAQkQAQQwHAYJKoZIhvcN
457
+ # AQkFMQ8XDTIyMTAyNDE4MzMzOFowKwYLKoZIhvcNAQkQAgwxHDAaMBgwFgQU84ci
458
+ # TYYzgpI1qZS8vY+W6f4cfHMwLwYJKoZIhvcNAQkEMSIEILoHmtH34MMtLSezOEUS
459
+ # 8z6MwtqV/PFPq/sNVq5aJnKMMDcGCyqGSIb3DQEJEAIvMSgwJjAkMCIEIMf04b4y
460
+ # KIkgq+ImOr4axPxP5ngcLWTQTIB1V6Ajtbb6MA0GCSqGSIb3DQEBAQUABIICAEtb
461
+ # WINxaVTjBdclvuFwJT/uHWvlOdcKzc1o+toRkFb1OA7shEdXFvjNU549TilTs8qQ
462
+ # bly8CbFcz3JzLVLrNKO7lr4GXd2iyJV5sv/XU4ED866fznOnFWtZJvxKGOdqN0W7
463
+ # 01pw7mIJ8+2aRqpow1ppPzju7VagRQ8fKmtj9Sg5N8Ja3+AehpjwM/PYzctan/1m
464
+ # ytIK/HCw5k/MeGmPVBs/fqbN0DT4KGrJ7YMySdYZMs0U9V7Ak7PelZLgw8BkNi1Y
465
+ # Rb9i+7/t9AaBlVYMy/6+gzdsnarnlSzV8/6Est8w4Ie7sBxx3Tpsokopb+oPF///
466
+ # 2cA3jMNToO9YfsqvgpTEkWwjWanC2cd26K8ikw0uu0klmaxNvYpP459/QU3JMyFj
467
+ # I4ReTxVXLZrQlzCDUUdLmLSeV1AugCOYOHM2RAv4r+3qxk0jBCfA8RRK+prLNjXE
468
+ # af1QEbeRRNr0418MtnBdIzxHnW8yffWfHmtDNJoyPqggkRU3Mb8Myu8QPD3ZiCPj
469
+ # F+HsKUntyCV64hr9BNLmkpbw+kUvGtC0/7sZF9Gyp/DKnnbQu8vSR+CaZQqVQxJo
470
+ # UeI7m44utNTSSZCJ9JV7bnniwqztrP/r2PTAxkUywoCzif6R863qJ/uQA0QQjq8t
471
+ # +aR822g6YVyJsLYQKbpEgshG2QwzGHun5HkvawJ8
472
+ # SIG # End signature block
llmrag/Scripts/activate ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file must be used with "source bin/activate" *from bash*
2
+ # you cannot run it directly
3
+
4
+ deactivate () {
5
+ # reset old environment variables
6
+ if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
7
+ PATH="${_OLD_VIRTUAL_PATH:-}"
8
+ export PATH
9
+ unset _OLD_VIRTUAL_PATH
10
+ fi
11
+ if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
12
+ PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
13
+ export PYTHONHOME
14
+ unset _OLD_VIRTUAL_PYTHONHOME
15
+ fi
16
+
17
+ # This should detect bash and zsh, which have a hash command that must
18
+ # be called to get it to forget past commands. Without forgetting
19
+ # past commands the $PATH changes we made may not be respected
20
+ if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
21
+ hash -r 2> /dev/null
22
+ fi
23
+
24
+ if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
25
+ PS1="${_OLD_VIRTUAL_PS1:-}"
26
+ export PS1
27
+ unset _OLD_VIRTUAL_PS1
28
+ fi
29
+
30
+ unset VIRTUAL_ENV
31
+ unset VIRTUAL_ENV_PROMPT
32
+ if [ ! "${1:-}" = "nondestructive" ] ; then
33
+ # Self destruct!
34
+ unset -f deactivate
35
+ fi
36
+ }
37
+
38
+ # unset irrelevant variables
39
+ deactivate nondestructive
40
+
41
+ VIRTUAL_ENV="D:\llm-chatbot\rag-youtube-assistant\llmrag"
42
+ export VIRTUAL_ENV
43
+
44
+ _OLD_VIRTUAL_PATH="$PATH"
45
+ PATH="$VIRTUAL_ENV/Scripts:$PATH"
46
+ export PATH
47
+
48
+ # unset PYTHONHOME if set
49
+ # this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
50
+ # could use `if (set -u; : $PYTHONHOME) ;` in bash
51
+ if [ -n "${PYTHONHOME:-}" ] ; then
52
+ _OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
53
+ unset PYTHONHOME
54
+ fi
55
+
56
+ if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
57
+ _OLD_VIRTUAL_PS1="${PS1:-}"
58
+ PS1="(llmrag) ${PS1:-}"
59
+ export PS1
60
+ VIRTUAL_ENV_PROMPT="(llmrag) "
61
+ export VIRTUAL_ENV_PROMPT
62
+ fi
63
+
64
+ # This should detect bash and zsh, which have a hash command that must
65
+ # be called to get it to forget past commands. Without forgetting
66
+ # past commands the $PATH changes we made may not be respected
67
+ if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
68
+ hash -r 2> /dev/null
69
+ fi
llmrag/Scripts/activate.bat ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+
3
+ rem This file is UTF-8 encoded, so we need to update the current code page while executing it
4
+ for /f "tokens=2 delims=:." %%a in ('"%SystemRoot%\System32\chcp.com"') do (
5
+ set _OLD_CODEPAGE=%%a
6
+ )
7
+ if defined _OLD_CODEPAGE (
8
+ "%SystemRoot%\System32\chcp.com" 65001 > nul
9
+ )
10
+
11
+ set VIRTUAL_ENV=D:\llm-chatbot\rag-youtube-assistant\llmrag
12
+
13
+ if not defined PROMPT set PROMPT=$P$G
14
+
15
+ if defined _OLD_VIRTUAL_PROMPT set PROMPT=%_OLD_VIRTUAL_PROMPT%
16
+ if defined _OLD_VIRTUAL_PYTHONHOME set PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%
17
+
18
+ set _OLD_VIRTUAL_PROMPT=%PROMPT%
19
+ set PROMPT=(llmrag) %PROMPT%
20
+
21
+ if defined PYTHONHOME set _OLD_VIRTUAL_PYTHONHOME=%PYTHONHOME%
22
+ set PYTHONHOME=
23
+
24
+ if defined _OLD_VIRTUAL_PATH set PATH=%_OLD_VIRTUAL_PATH%
25
+ if not defined _OLD_VIRTUAL_PATH set _OLD_VIRTUAL_PATH=%PATH%
26
+
27
+ set PATH=%VIRTUAL_ENV%\Scripts;%PATH%
28
+ set VIRTUAL_ENV_PROMPT=(llmrag)
29
+
30
+ :END
31
+ if defined _OLD_CODEPAGE (
32
+ "%SystemRoot%\System32\chcp.com" %_OLD_CODEPAGE% > nul
33
+ set _OLD_CODEPAGE=
34
+ )
llmrag/Scripts/deactivate.bat ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+
3
+ if defined _OLD_VIRTUAL_PROMPT (
4
+ set "PROMPT=%_OLD_VIRTUAL_PROMPT%"
5
+ )
6
+ set _OLD_VIRTUAL_PROMPT=
7
+
8
+ if defined _OLD_VIRTUAL_PYTHONHOME (
9
+ set "PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%"
10
+ set _OLD_VIRTUAL_PYTHONHOME=
11
+ )
12
+
13
+ if defined _OLD_VIRTUAL_PATH (
14
+ set "PATH=%_OLD_VIRTUAL_PATH%"
15
+ )
16
+
17
+ set _OLD_VIRTUAL_PATH=
18
+
19
+ set VIRTUAL_ENV=
20
+ set VIRTUAL_ENV_PROMPT=
21
+
22
+ :END
llmrag/Scripts/pip.exe ADDED
Binary file (108 kB). View file
 
llmrag/Scripts/pip3.10.exe ADDED
Binary file (108 kB). View file
 
llmrag/Scripts/pip3.11.exe ADDED
Binary file (108 kB). View file
 
llmrag/Scripts/pip3.exe ADDED
Binary file (108 kB). View file
 
llmrag/Scripts/python.exe ADDED
Binary file (268 kB). View file
 
llmrag/Scripts/pythonw.exe ADDED
Binary file (256 kB). View file
 
llmrag/pyvenv.cfg ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ home = C:\Python311
2
+ include-system-site-packages = false
3
+ version = 3.11.0
4
+ executable = C:\Python311\python.exe
5
+ command = C:\Python311\python.exe -m venv D:\llm-chatbot\rag-youtube-assistant\llmrag
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit
2
+ youtube_transcript_api
3
+ sentence-transformers
4
+ google-api-python-client
5
+ google-auth-httplib2
6
+ google-auth-oauthlib
7
+ pandas
8
+ numpy
9
+ scikit-learn
10
+ elasticsearch
11
+ ollama
12
+ requests
13
+ matplotlib
14
+ tqdm
run-docker-compose-windows.ps1 ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Define the path to the .env file
2
+ $envPath = ".\.env"
3
+
4
+ # Check if the .env file exists
5
+ if (Test-Path $envPath) {
6
+ # Read the .env file
7
+ $envContent = Get-Content $envPath
8
+
9
+ # Parse the environment variables
10
+ foreach ($line in $envContent) {
11
+ if ($line -match '^([^=]+)=(.*)$') {
12
+ $name = $matches[1]
13
+ $value = $matches[2]
14
+ [Environment]::SetEnvironmentVariable($name, $value, "Process")
15
+ }
16
+ }
17
+
18
+ # Run docker-compose
19
+ docker-compose up --build
20
+ }
21
+ else {
22
+ Write-Error "The .env file was not found at $envPath"
23
+ }
run-docker-compose.sh ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Start Ollama
4
+ ollama serve &
5
+
6
+ # Wait for Ollama to start
7
+ sleep 10
8
+
9
+ # Run Phi model to ensure it's loaded
10
+ ollama run phi "hello" &
11
+
12
+ # Generate ground truth
13
+ python generate_ground_truth.py
14
+
15
+ # Run RAG evaluation
16
+ python rag_evaluation.py
17
+
18
+ # Start the Streamlit app
19
+ streamlit run main.py