Spaces:
Running
Running
first modification
Browse files- .env_template +1 -0
- Dockerfile +30 -0
- README.md +120 -2
- app/data_processor.py +145 -0
- app/database.py +103 -0
- app/elasticsearch_handler.py +37 -0
- app/evaluation.py +48 -0
- app/generate_ground_truth.py +67 -0
- app/main.py +287 -0
- app/minsearch.py +96 -0
- app/query_rewriter.py +33 -0
- app/rag.py +31 -0
- app/rag_evaluation.py +193 -0
- app/transcript_extractor.py +85 -0
- config/config.yaml +0 -0
- data/sqlite.db +0 -0
- docker-compose.yaml +42 -0
- grafana/provisioning/dashboards/rag_evaluation.json +129 -0
- grafana/provisioning/datasources/sqlite.yaml +7 -0
- llmrag/Scripts/Activate.ps1 +472 -0
- llmrag/Scripts/activate +69 -0
- llmrag/Scripts/activate.bat +34 -0
- llmrag/Scripts/deactivate.bat +22 -0
- llmrag/Scripts/pip.exe +0 -0
- llmrag/Scripts/pip3.10.exe +0 -0
- llmrag/Scripts/pip3.11.exe +0 -0
- llmrag/Scripts/pip3.exe +0 -0
- llmrag/Scripts/python.exe +0 -0
- llmrag/Scripts/pythonw.exe +0 -0
- llmrag/pyvenv.cfg +5 -0
- requirements.txt +14 -0
- run-docker-compose-windows.ps1 +23 -0
- run-docker-compose.sh +19 -0
.env_template
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
YOUTUBE_API_KEY='YOUR YOUTUBE_API_KEY'
|
Dockerfile
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Use an official Python runtime as a parent image
|
2 |
+
FROM python:3.9-slim
|
3 |
+
|
4 |
+
# Set the working directory in the container
|
5 |
+
WORKDIR /app
|
6 |
+
|
7 |
+
# Install system dependencies
|
8 |
+
RUN apt-get update && apt-get install -y \
|
9 |
+
build-essential \
|
10 |
+
curl \
|
11 |
+
software-properties-common \
|
12 |
+
&& rm -rf /var/lib/apt/lists/*
|
13 |
+
|
14 |
+
# Copy the requirements file into the container
|
15 |
+
COPY requirements.txt .
|
16 |
+
|
17 |
+
# Install any needed packages specified in requirements.txt
|
18 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
19 |
+
|
20 |
+
# Copy the application code into the container
|
21 |
+
COPY app/ ./app/
|
22 |
+
COPY config/ ./config/
|
23 |
+
COPY data/ ./data/
|
24 |
+
COPY grafana/ ./grafana/
|
25 |
+
|
26 |
+
# Make port 8501 available to the world outside this container
|
27 |
+
EXPOSE 8501
|
28 |
+
|
29 |
+
# Run the Streamlit app when the container launches
|
30 |
+
CMD ["streamlit", "run", "app/main.py", "--server.port=8501", "--server.address=0.0.0.0"]
|
README.md
CHANGED
@@ -1,2 +1,120 @@
|
|
1 |
-
#
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# YouTube Assistant
|
2 |
+
|
3 |
+
## Problem Description
|
4 |
+
|
5 |
+
In the era of abundant video content on YouTube, users often struggle to efficiently extract specific information or insights from lengthy videos without watching them in their entirety. This challenge is particularly acute when dealing with educational content, tutorials, or informative videos where key points may be scattered throughout the video's duration.
|
6 |
+
|
7 |
+
The YouTube Assistant project addresses this problem by providing a Retrieval-Augmented Generation (RAG) application that allows users to interact with and query video transcripts directly. This solution enables users to quickly access relevant information from YouTube videos without the need to watch them completely, saving time and improving the efficiency of information retrieval from video content.
|
8 |
+
|
9 |
+
## Data
|
10 |
+
|
11 |
+
The YouTube Assistant utilizes data pulled in real-time using the YouTube Data API v3. This data is then processed and stored in two databases:
|
12 |
+
|
13 |
+
1. SQLite database: For structured data storage
|
14 |
+
2. Elasticsearch vector database: For efficient similarity searches on embedded text
|
15 |
+
|
16 |
+
### Data Schema
|
17 |
+
|
18 |
+
The main columns in our data structure are:
|
19 |
+
|
20 |
+
```json
|
21 |
+
{
|
22 |
+
"content": {"type": "text"},
|
23 |
+
"video_id": {"type": "keyword"},
|
24 |
+
"segment_id": {"type": "keyword"},
|
25 |
+
"start_time": {"type": "float"},
|
26 |
+
"duration": {"type": "float"},
|
27 |
+
"title": {"type": "text"},
|
28 |
+
"author": {"type": "keyword"},
|
29 |
+
"upload_date": {"type": "date"},
|
30 |
+
"view_count": {"type": "integer"},
|
31 |
+
"like_count": {"type": "integer"},
|
32 |
+
"comment_count": {"type": "integer"},
|
33 |
+
"video_duration": {"type": "text"}
|
34 |
+
}
|
35 |
+
```
|
36 |
+
|
37 |
+
This schema allows for comprehensive storage of video metadata alongside the transcript content, enabling rich querying and analysis capabilities.
|
38 |
+
|
39 |
+
## Functionality
|
40 |
+
|
41 |
+
The YouTube Assistant offers the following key features:
|
42 |
+
|
43 |
+
1. **Real-time Data Extraction**: Utilizes the YouTube Data API v3 to fetch video data and transcripts on-demand.
|
44 |
+
|
45 |
+
2. **Efficient Data Storage**: Stores structured data in SQLite and uses Elasticsearch for vector embeddings, allowing for fast retrieval and similarity searches.
|
46 |
+
|
47 |
+
3. **Interactive Querying**: Provides a chat interface where users can ask questions about the video transcripts that have been downloaded or extracted in real-time.
|
48 |
+
|
49 |
+
4. **Contextual Understanding**: Leverages RAG technology to understand the context of user queries and provide relevant information from the video transcripts.
|
50 |
+
|
51 |
+
5. **Metadata Analysis**: Allows users to query not just the content of the videos but also metadata such as view counts, likes, and upload dates.
|
52 |
+
|
53 |
+
6. **Time-stamped Responses**: Can provide information about specific segments of videos, including start times and durations.
|
54 |
+
|
55 |
+
By combining these features, the YouTube Assistant empowers users to efficiently extract insights and information from YouTube videos without the need to watch them in full, significantly enhancing the way people interact with and learn from video content.
|
56 |
+
|
57 |
+
## Project Structure
|
58 |
+
|
59 |
+
The YouTube Assistant project is organized as follows:
|
60 |
+
|
61 |
+
```
|
62 |
+
youtube-rag-app/
|
63 |
+
├── app/
|
64 |
+
│ ├── main.py
|
65 |
+
│ ├── ui.py
|
66 |
+
│ ├── transcript_extractor.py
|
67 |
+
│ ├── data_processor.py
|
68 |
+
│ ├── elasticsearch_handler.py
|
69 |
+
│ ├── database.py
|
70 |
+
│ ├── rag.py
|
71 |
+
│ ├── query_rewriter.py
|
72 |
+
│ └── evaluation.py
|
73 |
+
├── data/
|
74 |
+
│ └── sqlite.db
|
75 |
+
├── config/
|
76 |
+
│ └── config.yaml
|
77 |
+
├── requirements.txt
|
78 |
+
├── Dockerfile
|
79 |
+
└── docker-compose.yml
|
80 |
+
```
|
81 |
+
|
82 |
+
### Directory and File Descriptions:
|
83 |
+
|
84 |
+
- `app/`: Contains the main application code
|
85 |
+
- `main.py`: Entry point of the application
|
86 |
+
- `ui.py`: Handles the user interface
|
87 |
+
- `transcript_extractor.py`: Manages YouTube transcript extraction
|
88 |
+
- `data_processor.py`: Processes and prepares data for storage and analysis
|
89 |
+
- `elasticsearch_handler.py`: Manages interactions with Elasticsearch
|
90 |
+
- `database.py`: Handles SQLite database operations
|
91 |
+
- `rag.py`: Implements the Retrieval-Augmented Generation logic
|
92 |
+
- `query_rewriter.py`: Refines and optimizes user queries
|
93 |
+
- `evaluation.py`: Contains evaluation metrics and functions
|
94 |
+
- `data/`: Stores the SQLite database
|
95 |
+
- `config/`: Contains configuration files
|
96 |
+
- `requirements.txt`: Lists all Python dependencies
|
97 |
+
- `Dockerfile`: Defines the Docker image for the application
|
98 |
+
- `docker-compose.yml`: Orchestrates the application and its services
|
99 |
+
|
100 |
+
## Getting Started
|
101 |
+
|
102 |
+
git clone [email protected]:ganesh3/rag-youtube-assistant.git
|
103 |
+
|
104 |
+
## Ingestion
|
105 |
+
|
106 |
+
## Evaluation
|
107 |
+
|
108 |
+
## Retrieval
|
109 |
+
|
110 |
+
### RAG Flow
|
111 |
+
|
112 |
+
## Monitoring
|
113 |
+
|
114 |
+
|
115 |
+
## Usage Examples
|
116 |
+
|
117 |
+
(Provide some example queries and interactions with the YouTube Assistant here.)
|
118 |
+
|
119 |
+
## License
|
120 |
+
GPL v3
|
app/data_processor.py
ADDED
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from minsearch import Index
|
2 |
+
from sentence_transformers import SentenceTransformer
|
3 |
+
import numpy as np
|
4 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
5 |
+
import re
|
6 |
+
from elasticsearch import Elasticsearch
|
7 |
+
import os
|
8 |
+
|
9 |
+
def clean_text(text):
|
10 |
+
# Remove special characters and extra whitespace
|
11 |
+
text = re.sub(r'[^\w\s]', '', text)
|
12 |
+
text = re.sub(r'\s+', ' ', text).strip()
|
13 |
+
return text
|
14 |
+
|
15 |
+
class DataProcessor:
|
16 |
+
def __init__(self, text_fields=["content", "title", "description"],
|
17 |
+
keyword_fields=["video_id", "start_time", "author", "upload_date"],
|
18 |
+
embedding_model="all-MiniLM-L6-v2"):
|
19 |
+
self.text_index = Index(text_fields=text_fields, keyword_fields=keyword_fields)
|
20 |
+
self.embedding_model = SentenceTransformer(embedding_model)
|
21 |
+
self.documents = []
|
22 |
+
self.embeddings = []
|
23 |
+
|
24 |
+
# Use environment variables for Elasticsearch configuration
|
25 |
+
elasticsearch_host = os.getenv('ELASTICSEARCH_HOST', 'localhost')
|
26 |
+
elasticsearch_port = int(os.getenv('ELASTICSEARCH_PORT', 9200))
|
27 |
+
|
28 |
+
# Initialize Elasticsearch client with explicit scheme
|
29 |
+
self.es = Elasticsearch([f'http://{elasticsearch_host}:{elasticsearch_port}'])
|
30 |
+
|
31 |
+
def process_transcript(self, video_id, transcript_data):
|
32 |
+
metadata = transcript_data['metadata']
|
33 |
+
transcript = transcript_data['transcript']
|
34 |
+
|
35 |
+
for i, segment in enumerate(transcript):
|
36 |
+
cleaned_text = clean_text(segment['text'])
|
37 |
+
doc = {
|
38 |
+
"video_id": video_id,
|
39 |
+
"content": cleaned_text,
|
40 |
+
"start_time": segment['start'],
|
41 |
+
"duration": segment['duration'],
|
42 |
+
"segment_id": f"{video_id}_{i}",
|
43 |
+
"title": metadata['title'],
|
44 |
+
"author": metadata['author'],
|
45 |
+
"upload_date": metadata['upload_date'],
|
46 |
+
"view_count": metadata['view_count'],
|
47 |
+
"like_count": metadata['like_count'],
|
48 |
+
"comment_count": metadata['comment_count'],
|
49 |
+
"video_duration": metadata['duration']
|
50 |
+
}
|
51 |
+
self.documents.append(doc)
|
52 |
+
self.embeddings.append(self.embedding_model.encode(cleaned_text + " " + metadata['title']))
|
53 |
+
|
54 |
+
def build_index(self, index_name):
|
55 |
+
self.text_index.fit(self.documents)
|
56 |
+
self.embeddings = np.array(self.embeddings)
|
57 |
+
|
58 |
+
# Create Elasticsearch index
|
59 |
+
if not self.es.indices.exists(index=index_name):
|
60 |
+
self.es.indices.create(index=index_name, body={
|
61 |
+
"mappings": {
|
62 |
+
"properties": {
|
63 |
+
"embedding": {"type": "dense_vector", "dims": self.embeddings.shape[1]},
|
64 |
+
"content": {"type": "text"},
|
65 |
+
"video_id": {"type": "keyword"},
|
66 |
+
"segment_id": {"type": "keyword"},
|
67 |
+
"start_time": {"type": "float"},
|
68 |
+
"duration": {"type": "float"},
|
69 |
+
"title": {"type": "text"},
|
70 |
+
"author": {"type": "keyword"},
|
71 |
+
"upload_date": {"type": "date"},
|
72 |
+
"view_count": {"type": "integer"},
|
73 |
+
"like_count": {"type": "integer"},
|
74 |
+
"comment_count": {"type": "integer"},
|
75 |
+
"video_duration": {"type": "text"}
|
76 |
+
}
|
77 |
+
}
|
78 |
+
})
|
79 |
+
|
80 |
+
# Index documents in Elasticsearch
|
81 |
+
for doc, embedding in zip(self.documents, self.embeddings):
|
82 |
+
doc['embedding'] = embedding.tolist()
|
83 |
+
self.es.index(index=index_name, body=doc, id=doc['segment_id'])
|
84 |
+
|
85 |
+
def search(self, query, filter_dict={}, boost_dict={}, num_results=10, method='hybrid', index_name=None):
|
86 |
+
if method == 'text':
|
87 |
+
return self.text_search(query, filter_dict, boost_dict, num_results)
|
88 |
+
elif method == 'embedding':
|
89 |
+
return self.embedding_search(query, num_results, index_name)
|
90 |
+
else: # hybrid search
|
91 |
+
text_results = self.text_search(query, filter_dict, boost_dict, num_results)
|
92 |
+
embedding_results = self.embedding_search(query, num_results, index_name)
|
93 |
+
return self.combine_results(text_results, embedding_results, num_results)
|
94 |
+
|
95 |
+
def text_search(self, query, filter_dict={}, boost_dict={}, num_results=10):
|
96 |
+
return self.text_index.search(query, filter_dict, boost_dict, num_results)
|
97 |
+
|
98 |
+
def embedding_search(self, query, num_results=10, index_name=None):
|
99 |
+
if index_name:
|
100 |
+
# Use Elasticsearch for embedding search
|
101 |
+
query_vector = self.embedding_model.encode(query).tolist()
|
102 |
+
script_query = {
|
103 |
+
"script_score": {
|
104 |
+
"query": {"match_all": {}},
|
105 |
+
"script": {
|
106 |
+
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
|
107 |
+
"params": {"query_vector": query_vector}
|
108 |
+
}
|
109 |
+
}
|
110 |
+
}
|
111 |
+
response = self.es.search(
|
112 |
+
index=index_name,
|
113 |
+
body={
|
114 |
+
"size": num_results,
|
115 |
+
"query": script_query,
|
116 |
+
"_source": {"excludes": ["embedding"]}
|
117 |
+
}
|
118 |
+
)
|
119 |
+
return [hit['_source'] for hit in response['hits']['hits']]
|
120 |
+
else:
|
121 |
+
# Use in-memory embedding search
|
122 |
+
query_embedding = self.embedding_model.encode(query)
|
123 |
+
similarities = cosine_similarity([query_embedding], self.embeddings)[0]
|
124 |
+
top_indices = np.argsort(similarities)[::-1][:num_results]
|
125 |
+
return [self.documents[i] for i in top_indices]
|
126 |
+
|
127 |
+
def combine_results(self, text_results, embedding_results, num_results):
|
128 |
+
combined = []
|
129 |
+
for i in range(max(len(text_results), len(embedding_results))):
|
130 |
+
if i < len(text_results):
|
131 |
+
combined.append(text_results[i])
|
132 |
+
if i < len(embedding_results):
|
133 |
+
combined.append(embedding_results[i])
|
134 |
+
|
135 |
+
seen = set()
|
136 |
+
deduped = []
|
137 |
+
for doc in combined:
|
138 |
+
if doc['segment_id'] not in seen:
|
139 |
+
seen.add(doc['segment_id'])
|
140 |
+
deduped.append(doc)
|
141 |
+
|
142 |
+
return deduped[:num_results]
|
143 |
+
|
144 |
+
def process_query(self, query):
|
145 |
+
return clean_text(query)
|
app/database.py
ADDED
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import sqlite3
|
2 |
+
import os
|
3 |
+
|
4 |
+
class DatabaseHandler:
|
5 |
+
def __init__(self, db_path='data/sqlite.db'):
|
6 |
+
self.db_path = db_path
|
7 |
+
self.conn = None
|
8 |
+
self.create_tables()
|
9 |
+
|
10 |
+
def create_tables(self):
|
11 |
+
with sqlite3.connect(self.db_path) as conn:
|
12 |
+
cursor = conn.cursor()
|
13 |
+
cursor.execute('''
|
14 |
+
CREATE TABLE IF NOT EXISTS videos (
|
15 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
16 |
+
youtube_id TEXT UNIQUE,
|
17 |
+
title TEXT,
|
18 |
+
channel_name TEXT,
|
19 |
+
processed_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
20 |
+
)
|
21 |
+
''')
|
22 |
+
cursor.execute('''
|
23 |
+
CREATE TABLE IF NOT EXISTS user_feedback (
|
24 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
25 |
+
video_id INTEGER,
|
26 |
+
query TEXT,
|
27 |
+
feedback INTEGER,
|
28 |
+
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
29 |
+
FOREIGN KEY (video_id) REFERENCES videos (id)
|
30 |
+
)
|
31 |
+
''')
|
32 |
+
cursor.execute('''
|
33 |
+
CREATE TABLE IF NOT EXISTS embedding_models (
|
34 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
35 |
+
model_name TEXT UNIQUE,
|
36 |
+
description TEXT
|
37 |
+
)
|
38 |
+
''')
|
39 |
+
cursor.execute('''
|
40 |
+
CREATE TABLE IF NOT EXISTS elasticsearch_indices (
|
41 |
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
42 |
+
video_id INTEGER,
|
43 |
+
index_name TEXT,
|
44 |
+
embedding_model_id INTEGER,
|
45 |
+
FOREIGN KEY (video_id) REFERENCES videos (id),
|
46 |
+
FOREIGN KEY (embedding_model_id) REFERENCES embedding_models (id)
|
47 |
+
)
|
48 |
+
''')
|
49 |
+
conn.commit()
|
50 |
+
|
51 |
+
def add_video(self, youtube_id, title, channel_name):
|
52 |
+
with sqlite3.connect(self.db_path) as conn:
|
53 |
+
cursor = conn.cursor()
|
54 |
+
cursor.execute('''
|
55 |
+
INSERT OR IGNORE INTO videos (youtube_id, title, channel_name)
|
56 |
+
VALUES (?, ?, ?)
|
57 |
+
''', (youtube_id, title, channel_name))
|
58 |
+
conn.commit()
|
59 |
+
return cursor.lastrowid
|
60 |
+
|
61 |
+
def add_user_feedback(self, video_id, query, feedback):
|
62 |
+
with sqlite3.connect(self.db_path) as conn:
|
63 |
+
cursor = conn.cursor()
|
64 |
+
cursor.execute('''
|
65 |
+
INSERT INTO user_feedback (video_id, query, feedback)
|
66 |
+
VALUES (?, ?, ?)
|
67 |
+
''', (video_id, query, feedback))
|
68 |
+
conn.commit()
|
69 |
+
|
70 |
+
def add_embedding_model(self, model_name, description):
|
71 |
+
with sqlite3.connect(self.db_path) as conn:
|
72 |
+
cursor = conn.cursor()
|
73 |
+
cursor.execute('''
|
74 |
+
INSERT OR IGNORE INTO embedding_models (model_name, description)
|
75 |
+
VALUES (?, ?)
|
76 |
+
''', (model_name, description))
|
77 |
+
conn.commit()
|
78 |
+
return cursor.lastrowid
|
79 |
+
|
80 |
+
def add_elasticsearch_index(self, video_id, index_name, embedding_model_id):
|
81 |
+
with sqlite3.connect(self.db_path) as conn:
|
82 |
+
cursor = conn.cursor()
|
83 |
+
cursor.execute('''
|
84 |
+
INSERT INTO elasticsearch_indices (video_id, index_name, embedding_model_id)
|
85 |
+
VALUES (?, ?, ?)
|
86 |
+
''', (video_id, index_name, embedding_model_id))
|
87 |
+
conn.commit()
|
88 |
+
|
89 |
+
def get_video_by_youtube_id(self, youtube_id):
|
90 |
+
with sqlite3.connect(self.db_path) as conn:
|
91 |
+
cursor = conn.cursor()
|
92 |
+
cursor.execute('SELECT * FROM videos WHERE youtube_id = ?', (youtube_id,))
|
93 |
+
return cursor.fetchone()
|
94 |
+
|
95 |
+
def get_elasticsearch_index(self, video_id, embedding_model_id):
|
96 |
+
with sqlite3.connect(self.db_path) as conn:
|
97 |
+
cursor = conn.cursor()
|
98 |
+
cursor.execute('''
|
99 |
+
SELECT index_name FROM elasticsearch_indices
|
100 |
+
WHERE video_id = ? AND embedding_model_id = ?
|
101 |
+
''', (video_id, embedding_model_id))
|
102 |
+
result = cursor.fetchone()
|
103 |
+
return result[0] if result else None
|
app/elasticsearch_handler.py
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from elasticsearch import Elasticsearch
|
2 |
+
import uuid
|
3 |
+
|
4 |
+
class ElasticsearchHandler:
|
5 |
+
def __init__(self, host='localhost', port=9200):
|
6 |
+
self.es = Elasticsearch([{'host': host, 'port': port}])
|
7 |
+
|
8 |
+
def create_index(self, index_name):
|
9 |
+
if not self.es.indices.exists(index=index_name):
|
10 |
+
self.es.indices.create(index=index_name)
|
11 |
+
|
12 |
+
def index_document(self, index_name, doc_id, text, embedding):
|
13 |
+
body = {
|
14 |
+
'text': text,
|
15 |
+
'embedding': embedding.tolist()
|
16 |
+
}
|
17 |
+
self.es.index(index=index_name, id=doc_id, body=body)
|
18 |
+
|
19 |
+
def search(self, index_name, query_vector, top_k=5):
|
20 |
+
script_query = {
|
21 |
+
"script_score": {
|
22 |
+
"query": {"match_all": {}},
|
23 |
+
"script": {
|
24 |
+
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
|
25 |
+
"params": {"query_vector": query_vector.tolist()}
|
26 |
+
}
|
27 |
+
}
|
28 |
+
}
|
29 |
+
response = self.es.search(
|
30 |
+
index=index_name,
|
31 |
+
body={
|
32 |
+
"size": top_k,
|
33 |
+
"query": script_query,
|
34 |
+
"_source": {"includes": ["text"]}
|
35 |
+
}
|
36 |
+
)
|
37 |
+
return [hit["_source"]["text"] for hit in response["hits"]["hits"]]
|
app/evaluation.py
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
2 |
+
import numpy as np
|
3 |
+
|
4 |
+
class EvaluationSystem:
|
5 |
+
def __init__(self, data_processor, database_handler):
|
6 |
+
self.data_processor = data_processor
|
7 |
+
self.db_handler = database_handler
|
8 |
+
|
9 |
+
def relevance_scoring(self, query, retrieved_docs, top_k=5):
|
10 |
+
query_embedding = self.data_processor.process_query(query)
|
11 |
+
doc_embeddings = [self.data_processor.process_query(doc) for doc in retrieved_docs]
|
12 |
+
|
13 |
+
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
|
14 |
+
return np.mean(sorted(similarities, reverse=True)[:top_k])
|
15 |
+
|
16 |
+
def answer_similarity(self, generated_answer, reference_answer):
|
17 |
+
gen_embedding = self.data_processor.process_query(generated_answer)
|
18 |
+
ref_embedding = self.data_processor.process_query(reference_answer)
|
19 |
+
return cosine_similarity([gen_embedding], [ref_embedding])[0][0]
|
20 |
+
|
21 |
+
def human_evaluation(self, video_id, query):
|
22 |
+
with self.db_handler.conn:
|
23 |
+
cursor = self.db_handler.conn.cursor()
|
24 |
+
cursor.execute('''
|
25 |
+
SELECT AVG(feedback) FROM user_feedback
|
26 |
+
WHERE video_id = ? AND query = ?
|
27 |
+
''', (video_id, query))
|
28 |
+
result = cursor.fetchone()
|
29 |
+
return result[0] if result[0] is not None else 0
|
30 |
+
|
31 |
+
def evaluate_rag_performance(self, rag_system, test_queries, reference_answers, index_name):
|
32 |
+
relevance_scores = []
|
33 |
+
similarity_scores = []
|
34 |
+
human_scores = []
|
35 |
+
|
36 |
+
for query, reference in zip(test_queries, reference_answers):
|
37 |
+
retrieved_docs = rag_system.es_handler.search(index_name, rag_system.data_processor.process_query(query))
|
38 |
+
generated_answer = rag_system.query(index_name, query)
|
39 |
+
|
40 |
+
relevance_scores.append(self.relevance_scoring(query, retrieved_docs))
|
41 |
+
similarity_scores.append(self.answer_similarity(generated_answer, reference))
|
42 |
+
human_scores.append(self.human_evaluation(index_name, query)) # Assuming index_name can be used as video_id
|
43 |
+
|
44 |
+
return {
|
45 |
+
"avg_relevance_score": np.mean(relevance_scores),
|
46 |
+
"avg_similarity_score": np.mean(similarity_scores),
|
47 |
+
"avg_human_score": np.mean(human_scores)
|
48 |
+
}
|
app/generate_ground_truth.py
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import pandas as pd
|
3 |
+
import json
|
4 |
+
from youtube_transcript_api import YouTubeTranscriptApi
|
5 |
+
from tqdm import tqdm
|
6 |
+
import requests
|
7 |
+
|
8 |
+
OLLAMA_HOST = os.getenv('OLLAMA_HOST', 'localhost')
|
9 |
+
OLLAMA_PORT = os.getenv('OLLAMA_PORT', '11434')
|
10 |
+
|
11 |
+
def get_transcript(video_id):
|
12 |
+
try:
|
13 |
+
transcript = YouTubeTranscriptApi.get_transcript(video_id)
|
14 |
+
return " ".join([entry['text'] for entry in transcript])
|
15 |
+
except Exception as e:
|
16 |
+
print(f"Error extracting transcript for video {video_id}: {str(e)}")
|
17 |
+
return None
|
18 |
+
|
19 |
+
def generate_questions(transcript):
|
20 |
+
prompt_template = """
|
21 |
+
You are an AI assistant tasked with generating questions based on a YouTube video transcript.
|
22 |
+
Formulate 10 questions that a user might ask based on the provided transcript.
|
23 |
+
Make the questions specific to the content of the transcript.
|
24 |
+
The questions should be complete and not too short. Use as few words as possible from the transcript.
|
25 |
+
|
26 |
+
The transcript:
|
27 |
+
|
28 |
+
{transcript}
|
29 |
+
|
30 |
+
Provide the output in parsable JSON without using code blocks:
|
31 |
+
|
32 |
+
{{"questions": ["question1", "question2", ..., "question10"]}}
|
33 |
+
""".strip()
|
34 |
+
|
35 |
+
prompt = prompt_template.format(transcript=transcript)
|
36 |
+
|
37 |
+
response = requests.post(f'http://{OLLAMA_HOST}:{OLLAMA_PORT}/api/generate', json={
|
38 |
+
'model': 'phi3.5',
|
39 |
+
'prompt': prompt
|
40 |
+
})
|
41 |
+
|
42 |
+
if response.status_code == 200:
|
43 |
+
return json.loads(response.json()['response'])
|
44 |
+
else:
|
45 |
+
print(f"Error: {response.status_code} - {response.text}")
|
46 |
+
return None
|
47 |
+
|
48 |
+
def main():
|
49 |
+
video_id = "zjkBMFhNj_g"
|
50 |
+
transcript = get_transcript(video_id)
|
51 |
+
|
52 |
+
if transcript:
|
53 |
+
questions = generate_questions(transcript)
|
54 |
+
|
55 |
+
if questions:
|
56 |
+
df = pd.DataFrame([(video_id, q) for q in questions['questions']], columns=['video_id', 'question'])
|
57 |
+
|
58 |
+
os.makedirs('data', exist_ok=True)
|
59 |
+
df.to_csv('data/ground-truth-retrieval.csv', index=False)
|
60 |
+
print("Ground truth data saved to data/ground-truth-retrieval.csv")
|
61 |
+
else:
|
62 |
+
print("Failed to generate questions.")
|
63 |
+
else:
|
64 |
+
print("Failed to generate ground truth data due to transcript retrieval error.")
|
65 |
+
|
66 |
+
if __name__ == "__main__":
|
67 |
+
main()
|
app/main.py
ADDED
@@ -0,0 +1,287 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import pandas as pd
|
3 |
+
from transcript_extractor import extract_video_id, get_transcript, get_channel_videos, process_videos
|
4 |
+
from data_processor import DataProcessor
|
5 |
+
from database import DatabaseHandler
|
6 |
+
from rag import RAGSystem
|
7 |
+
from query_rewriter import QueryRewriter
|
8 |
+
from evaluation import EvaluationSystem
|
9 |
+
from sentence_transformers import SentenceTransformer
|
10 |
+
import os
|
11 |
+
import json
|
12 |
+
import requests
|
13 |
+
from tqdm import tqdm
|
14 |
+
import sqlite3
|
15 |
+
|
16 |
+
# Initialize components
|
17 |
+
@st.cache_resource
|
18 |
+
def init_components():
|
19 |
+
db_handler = DatabaseHandler()
|
20 |
+
data_processor = DataProcessor()
|
21 |
+
rag_system = RAGSystem(data_processor)
|
22 |
+
query_rewriter = QueryRewriter()
|
23 |
+
evaluation_system = EvaluationSystem(data_processor, db_handler)
|
24 |
+
return db_handler, data_processor, rag_system, query_rewriter, evaluation_system
|
25 |
+
|
26 |
+
db_handler, data_processor, rag_system, query_rewriter, evaluation_system = init_components()
|
27 |
+
|
28 |
+
# Ground Truth Generation
|
29 |
+
def generate_questions(transcript):
|
30 |
+
OLLAMA_HOST = os.getenv('OLLAMA_HOST', 'localhost')
|
31 |
+
OLLAMA_PORT = os.getenv('OLLAMA_PORT', '11434')
|
32 |
+
prompt_template = """
|
33 |
+
You are an AI assistant tasked with generating questions based on a YouTube video transcript.
|
34 |
+
Formulate 10 questions that a user might ask based on the provided transcript.
|
35 |
+
Make the questions specific to the content of the transcript.
|
36 |
+
The questions should be complete and not too short. Use as few words as possible from the transcript.
|
37 |
+
|
38 |
+
The transcript:
|
39 |
+
|
40 |
+
{transcript}
|
41 |
+
|
42 |
+
Provide the output in parsable JSON without using code blocks:
|
43 |
+
|
44 |
+
{{"questions": ["question1", "question2", ..., "question10"]}}
|
45 |
+
""".strip()
|
46 |
+
|
47 |
+
prompt = prompt_template.format(transcript=transcript)
|
48 |
+
|
49 |
+
try:
|
50 |
+
response = requests.post(f'http://{OLLAMA_HOST}:{OLLAMA_PORT}/api/generate', json={
|
51 |
+
'model': 'phi3.5',
|
52 |
+
'prompt': prompt
|
53 |
+
})
|
54 |
+
response.raise_for_status()
|
55 |
+
return json.loads(response.json()['response'])
|
56 |
+
except requests.RequestException as e:
|
57 |
+
st.error(f"Error generating questions: {str(e)}")
|
58 |
+
return None
|
59 |
+
|
60 |
+
def generate_ground_truth(video_id):
|
61 |
+
transcript_data = get_transcript(video_id)
|
62 |
+
|
63 |
+
if transcript_data and 'transcript' in transcript_data:
|
64 |
+
full_transcript = " ".join([entry['text'] for entry in transcript_data['transcript']])
|
65 |
+
questions = generate_questions(full_transcript)
|
66 |
+
|
67 |
+
if questions and 'questions' in questions:
|
68 |
+
df = pd.DataFrame([(video_id, q) for q in questions['questions']], columns=['video_id', 'question'])
|
69 |
+
|
70 |
+
os.makedirs('data', exist_ok=True)
|
71 |
+
df.to_csv('data/ground-truth-retrieval.csv', index=False)
|
72 |
+
st.success("Ground truth data generated and saved to data/ground-truth-retrieval.csv")
|
73 |
+
return df
|
74 |
+
else:
|
75 |
+
st.error("Failed to generate questions.")
|
76 |
+
else:
|
77 |
+
st.error("Failed to generate ground truth data due to transcript retrieval error.")
|
78 |
+
return None
|
79 |
+
|
80 |
+
# RAG Evaluation
|
81 |
+
def evaluate_rag(sample_size=200):
|
82 |
+
try:
|
83 |
+
ground_truth = pd.read_csv('data/ground-truth-retrieval.csv')
|
84 |
+
except FileNotFoundError:
|
85 |
+
st.error("Ground truth file not found. Please generate ground truth data first.")
|
86 |
+
return None
|
87 |
+
|
88 |
+
sample = ground_truth.sample(n=min(sample_size, len(ground_truth)), random_state=1)
|
89 |
+
evaluations = []
|
90 |
+
|
91 |
+
prompt_template = """
|
92 |
+
You are an expert evaluator for a Youtube transcript assistant.
|
93 |
+
Your task is to analyze the relevance of the generated answer to the given question.
|
94 |
+
Based on the relevance of the generated answer, you will classify it
|
95 |
+
as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".
|
96 |
+
|
97 |
+
Here is the data for evaluation:
|
98 |
+
|
99 |
+
Question: {question}
|
100 |
+
Generated Answer: {answer_llm}
|
101 |
+
|
102 |
+
Please analyze the content and context of the generated answer in relation to the question
|
103 |
+
and provide your evaluation in parsable JSON without using code blocks:
|
104 |
+
|
105 |
+
{{
|
106 |
+
"Relevance": "NON_RELEVANT" | "PARTLY_RELEVANT" | "RELEVANT",
|
107 |
+
"Explanation": "[Provide a brief explanation for your evaluation]"
|
108 |
+
}}
|
109 |
+
""".strip()
|
110 |
+
|
111 |
+
progress_bar = st.progress(0)
|
112 |
+
for i, (_, row) in enumerate(sample.iterrows()):
|
113 |
+
question = row['question']
|
114 |
+
answer_llm = rag_system.query(question)
|
115 |
+
prompt = prompt_template.format(question=question, answer_llm=answer_llm)
|
116 |
+
evaluation = rag_system.query(prompt) # Assuming rag_system can handle this type of query
|
117 |
+
try:
|
118 |
+
evaluation_json = json.loads(evaluation)
|
119 |
+
evaluations.append((row['video_id'], question, answer_llm, evaluation_json['Relevance'], evaluation_json['Explanation']))
|
120 |
+
except json.JSONDecodeError:
|
121 |
+
st.warning(f"Failed to parse evaluation for question: {question}")
|
122 |
+
progress_bar.progress((i + 1) / len(sample))
|
123 |
+
|
124 |
+
# Store RAG evaluations in the database
|
125 |
+
conn = sqlite3.connect('data/sqlite.db')
|
126 |
+
cursor = conn.cursor()
|
127 |
+
cursor.execute('''
|
128 |
+
CREATE TABLE IF NOT EXISTS rag_evaluations (
|
129 |
+
video_id TEXT,
|
130 |
+
question TEXT,
|
131 |
+
answer TEXT,
|
132 |
+
relevance TEXT,
|
133 |
+
explanation TEXT
|
134 |
+
)
|
135 |
+
''')
|
136 |
+
cursor.executemany('''
|
137 |
+
INSERT INTO rag_evaluations (video_id, question, answer, relevance, explanation)
|
138 |
+
VALUES (?, ?, ?, ?, ?)
|
139 |
+
''', evaluations)
|
140 |
+
conn.commit()
|
141 |
+
conn.close()
|
142 |
+
|
143 |
+
st.success("Evaluation complete. Results stored in the database.")
|
144 |
+
return evaluations
|
145 |
+
|
146 |
+
def main():
|
147 |
+
st.title("YouTube Transcript RAG System")
|
148 |
+
|
149 |
+
tab1, tab2, tab3 = st.tabs(["RAG System", "Ground Truth Generation", "Evaluation"])
|
150 |
+
|
151 |
+
with tab1:
|
152 |
+
st.header("RAG System")
|
153 |
+
# Input section
|
154 |
+
input_type = st.radio("Select input type:", ["Video URL", "Channel URL", "YouTube ID"])
|
155 |
+
input_value = st.text_input("Enter the URL or ID:")
|
156 |
+
embedding_model = st.selectbox("Select embedding model:", ["all-MiniLM-L6-v2", "all-mpnet-base-v2"])
|
157 |
+
|
158 |
+
if st.button("Process"):
|
159 |
+
with st.spinner("Processing..."):
|
160 |
+
data_processor.embedding_model = SentenceTransformer(embedding_model)
|
161 |
+
if input_type == "Video URL":
|
162 |
+
video_id = extract_video_id(input_value)
|
163 |
+
if video_id:
|
164 |
+
process_single_video(video_id, embedding_model)
|
165 |
+
else:
|
166 |
+
st.error("Failed to extract video ID from the URL")
|
167 |
+
elif input_type == "Channel URL":
|
168 |
+
channel_videos = get_channel_videos(input_value)
|
169 |
+
if channel_videos:
|
170 |
+
process_multiple_videos([video['video_id'] for video in channel_videos], embedding_model)
|
171 |
+
else:
|
172 |
+
st.error("Failed to retrieve videos from the channel")
|
173 |
+
else:
|
174 |
+
process_single_video(input_value, embedding_model)
|
175 |
+
|
176 |
+
# Query section
|
177 |
+
st.subheader("Query the RAG System")
|
178 |
+
query = st.text_input("Enter your query:")
|
179 |
+
rewrite_method = st.radio("Query rewriting method:", ["None", "Chain of Thought", "ReAct"])
|
180 |
+
search_method = st.radio("Search method:", ["Hybrid", "Text-only", "Embedding-only"])
|
181 |
+
|
182 |
+
if st.button("Search"):
|
183 |
+
with st.spinner("Searching..."):
|
184 |
+
if rewrite_method == "Chain of Thought":
|
185 |
+
query = query_rewriter.rewrite_cot(query)
|
186 |
+
elif rewrite_method == "ReAct":
|
187 |
+
query = query_rewriter.rewrite_react(query)
|
188 |
+
|
189 |
+
search_method_map = {"Hybrid": "hybrid", "Text-only": "text", "Embedding-only": "embedding"}
|
190 |
+
response = rag_system.query(query, search_method=search_method_map[search_method])
|
191 |
+
st.write("Response:", response)
|
192 |
+
|
193 |
+
# Feedback
|
194 |
+
feedback = st.radio("Provide feedback:", ["+1", "-1"])
|
195 |
+
if st.button("Submit Feedback"):
|
196 |
+
db_handler.add_user_feedback("all_videos", query, 1 if feedback == "+1" else -1)
|
197 |
+
st.success("Feedback submitted successfully!")
|
198 |
+
|
199 |
+
with tab2:
|
200 |
+
st.header("Ground Truth Generation")
|
201 |
+
video_id = st.text_input("Enter YouTube Video ID for ground truth generation:")
|
202 |
+
if st.button("Generate Ground Truth"):
|
203 |
+
with st.spinner("Generating ground truth..."):
|
204 |
+
ground_truth_df = generate_ground_truth(video_id)
|
205 |
+
if ground_truth_df is not None:
|
206 |
+
st.dataframe(ground_truth_df)
|
207 |
+
csv = ground_truth_df.to_csv(index=False)
|
208 |
+
st.download_button(
|
209 |
+
label="Download Ground Truth CSV",
|
210 |
+
data=csv,
|
211 |
+
file_name="ground_truth.csv",
|
212 |
+
mime="text/csv",
|
213 |
+
)
|
214 |
+
|
215 |
+
with tab3:
|
216 |
+
st.header("RAG Evaluation")
|
217 |
+
sample_size = st.number_input("Enter sample size for evaluation:", min_value=1, max_value=1000, value=200)
|
218 |
+
if st.button("Run Evaluation"):
|
219 |
+
with st.spinner("Running evaluation..."):
|
220 |
+
evaluation_results = evaluate_rag(sample_size)
|
221 |
+
if evaluation_results:
|
222 |
+
st.write("Evaluation Results:")
|
223 |
+
st.dataframe(pd.DataFrame(evaluation_results, columns=['Video ID', 'Question', 'Answer', 'Relevance', 'Explanation']))
|
224 |
+
|
225 |
+
@st.cache_data
|
226 |
+
def process_single_video(video_id, embedding_model):
|
227 |
+
# Check if the video has already been processed with the current embedding model
|
228 |
+
existing_index = db_handler.get_elasticsearch_index(video_id, embedding_model)
|
229 |
+
if existing_index:
|
230 |
+
st.info(f"Video {video_id} has already been processed with {embedding_model}. Using existing index: {existing_index}")
|
231 |
+
return existing_index
|
232 |
+
|
233 |
+
transcript_data = get_transcript(video_id)
|
234 |
+
if transcript_data:
|
235 |
+
# Store video metadata in the database
|
236 |
+
video_data = {
|
237 |
+
'video_id': video_id,
|
238 |
+
'title': transcript_data['metadata'].get('title', 'Unknown Title'),
|
239 |
+
'author': transcript_data['metadata'].get('author', 'Unknown Author'),
|
240 |
+
'upload_date': transcript_data['metadata'].get('upload_date', 'Unknown Date'),
|
241 |
+
'view_count': int(transcript_data['metadata'].get('view_count', 0)),
|
242 |
+
'like_count': int(transcript_data['metadata'].get('like_count', 0)),
|
243 |
+
'comment_count': int(transcript_data['metadata'].get('comment_count', 0)),
|
244 |
+
'video_duration': transcript_data['metadata'].get('duration', 'Unknown Duration')
|
245 |
+
}
|
246 |
+
db_handler.add_video(video_data)
|
247 |
+
|
248 |
+
# Store transcript segments in the database
|
249 |
+
for i, segment in enumerate(transcript_data['transcript']):
|
250 |
+
segment_data = {
|
251 |
+
'segment_id': f"{video_id}_{i}",
|
252 |
+
'video_id': video_id,
|
253 |
+
'content': segment.get('text', ''),
|
254 |
+
'start_time': segment.get('start', 0),
|
255 |
+
'duration': segment.get('duration', 0)
|
256 |
+
}
|
257 |
+
db_handler.add_transcript_segment(segment_data)
|
258 |
+
|
259 |
+
# Process transcript for RAG system
|
260 |
+
data_processor.process_transcript(video_id, transcript_data)
|
261 |
+
|
262 |
+
# Create Elasticsearch index
|
263 |
+
index_name = f"video_{video_id}_{embedding_model}"
|
264 |
+
data_processor.build_index(index_name)
|
265 |
+
|
266 |
+
# Store Elasticsearch index information
|
267 |
+
db_handler.add_elasticsearch_index(video_id, index_name, embedding_model)
|
268 |
+
|
269 |
+
st.success(f"Processed and indexed transcript for video {video_id}")
|
270 |
+
st.write("Metadata:", transcript_data['metadata'])
|
271 |
+
return index_name
|
272 |
+
else:
|
273 |
+
st.error(f"Failed to retrieve transcript for video {video_id}")
|
274 |
+
return None
|
275 |
+
|
276 |
+
@st.cache_data
|
277 |
+
def process_multiple_videos(video_ids, embedding_model):
|
278 |
+
indices = []
|
279 |
+
for video_id in video_ids:
|
280 |
+
index = process_single_video(video_id, embedding_model)
|
281 |
+
if index:
|
282 |
+
indices.append(index)
|
283 |
+
st.success(f"Processed and indexed transcripts for {len(indices)} videos")
|
284 |
+
return indices
|
285 |
+
|
286 |
+
if __name__ == "__main__":
|
287 |
+
main()
|
app/minsearch.py
ADDED
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
|
3 |
+
from sklearn.feature_extraction.text import TfidfVectorizer
|
4 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
5 |
+
|
6 |
+
import numpy as np
|
7 |
+
|
8 |
+
|
9 |
+
class Index:
|
10 |
+
"""
|
11 |
+
A simple search index using TF-IDF and cosine similarity for text fields and exact matching for keyword fields.
|
12 |
+
|
13 |
+
Attributes:
|
14 |
+
text_fields (list): List of text field names to index.
|
15 |
+
keyword_fields (list): List of keyword field names to index.
|
16 |
+
vectorizers (dict): Dictionary of TfidfVectorizer instances for each text field.
|
17 |
+
keyword_df (pd.DataFrame): DataFrame containing keyword field data.
|
18 |
+
text_matrices (dict): Dictionary of TF-IDF matrices for each text field.
|
19 |
+
docs (list): List of documents indexed.
|
20 |
+
"""
|
21 |
+
|
22 |
+
def __init__(self, text_fields, keyword_fields, vectorizer_params={}):
|
23 |
+
"""
|
24 |
+
Initializes the Index with specified text and keyword fields.
|
25 |
+
|
26 |
+
Args:
|
27 |
+
text_fields (list): List of text field names to index.
|
28 |
+
keyword_fields (list): List of keyword field names to index.
|
29 |
+
vectorizer_params (dict): Optional parameters to pass to TfidfVectorizer.
|
30 |
+
"""
|
31 |
+
self.text_fields = text_fields
|
32 |
+
self.keyword_fields = keyword_fields
|
33 |
+
|
34 |
+
self.vectorizers = {field: TfidfVectorizer(**vectorizer_params) for field in text_fields}
|
35 |
+
self.keyword_df = None
|
36 |
+
self.text_matrices = {}
|
37 |
+
self.docs = []
|
38 |
+
|
39 |
+
def fit(self, docs):
|
40 |
+
"""
|
41 |
+
Fits the index with the provided documents.
|
42 |
+
|
43 |
+
Args:
|
44 |
+
docs (list of dict): List of documents to index. Each document is a dictionary.
|
45 |
+
"""
|
46 |
+
self.docs = docs
|
47 |
+
keyword_data = {field: [] for field in self.keyword_fields}
|
48 |
+
|
49 |
+
for field in self.text_fields:
|
50 |
+
texts = [doc.get(field, '') for doc in docs]
|
51 |
+
self.text_matrices[field] = self.vectorizers[field].fit_transform(texts)
|
52 |
+
|
53 |
+
for doc in docs:
|
54 |
+
for field in self.keyword_fields:
|
55 |
+
keyword_data[field].append(doc.get(field, ''))
|
56 |
+
|
57 |
+
self.keyword_df = pd.DataFrame(keyword_data)
|
58 |
+
|
59 |
+
return self
|
60 |
+
|
61 |
+
def search(self, query, filter_dict={}, boost_dict={}, num_results=10):
|
62 |
+
"""
|
63 |
+
Searches the index with the given query, filters, and boost parameters.
|
64 |
+
|
65 |
+
Args:
|
66 |
+
query (str): The search query string.
|
67 |
+
filter_dict (dict): Dictionary of keyword fields to filter by. Keys are field names and values are the values to filter by.
|
68 |
+
boost_dict (dict): Dictionary of boost scores for text fields. Keys are field names and values are the boost scores.
|
69 |
+
num_results (int): The number of top results to return. Defaults to 10.
|
70 |
+
|
71 |
+
Returns:
|
72 |
+
list of dict: List of documents matching the search criteria, ranked by relevance.
|
73 |
+
"""
|
74 |
+
query_vecs = {field: self.vectorizers[field].transform([query]) for field in self.text_fields}
|
75 |
+
scores = np.zeros(len(self.docs))
|
76 |
+
|
77 |
+
# Compute cosine similarity for each text field and apply boost
|
78 |
+
for field, query_vec in query_vecs.items():
|
79 |
+
sim = cosine_similarity(query_vec, self.text_matrices[field]).flatten()
|
80 |
+
boost = boost_dict.get(field, 1)
|
81 |
+
scores += sim * boost
|
82 |
+
|
83 |
+
# Apply keyword filters
|
84 |
+
for field, value in filter_dict.items():
|
85 |
+
if field in self.keyword_fields:
|
86 |
+
mask = self.keyword_df[field] == value
|
87 |
+
scores = scores * mask.to_numpy()
|
88 |
+
|
89 |
+
# Use argpartition to get top num_results indices
|
90 |
+
top_indices = np.argpartition(scores, -num_results)[-num_results:]
|
91 |
+
top_indices = top_indices[np.argsort(-scores[top_indices])]
|
92 |
+
|
93 |
+
# Filter out zero-score results
|
94 |
+
top_docs = [self.docs[i] for i in top_indices if scores[i] > 0]
|
95 |
+
|
96 |
+
return top_docs
|
app/query_rewriter.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import ollama
|
2 |
+
|
3 |
+
class QueryRewriter:
|
4 |
+
def __init__(self):
|
5 |
+
self.model = "phi" # Using Phi-3.5 model
|
6 |
+
|
7 |
+
def rewrite_cot(self, query):
|
8 |
+
prompt = f"""
|
9 |
+
Rewrite the following query using Chain-of-Thought reasoning:
|
10 |
+
Query: {query}
|
11 |
+
|
12 |
+
Rewritten query:
|
13 |
+
"""
|
14 |
+
response = ollama.generate(model=self.model, prompt=prompt)
|
15 |
+
return response['response'].strip()
|
16 |
+
|
17 |
+
def rewrite_react(self, query):
|
18 |
+
prompt = f"""
|
19 |
+
Rewrite the following query using the ReAct framework (Reasoning and Acting):
|
20 |
+
Query: {query}
|
21 |
+
|
22 |
+
Thought 1:
|
23 |
+
Action 1:
|
24 |
+
Observation 1:
|
25 |
+
|
26 |
+
Thought 2:
|
27 |
+
Action 2:
|
28 |
+
Observation 2:
|
29 |
+
|
30 |
+
Final rewritten query:
|
31 |
+
"""
|
32 |
+
response = ollama.generate(model=self.model, prompt=prompt)
|
33 |
+
return response['response'].strip()
|
app/rag.py
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import ollama
|
2 |
+
|
3 |
+
class RAGSystem:
|
4 |
+
def __init__(self, data_processor):
|
5 |
+
self.data_processor = data_processor
|
6 |
+
self.model = "phi3.5" # Using Phi-3.5 model
|
7 |
+
|
8 |
+
def query(self, user_query, top_k=3, search_method='hybrid'):
|
9 |
+
# Retrieve relevant documents using the specified search method
|
10 |
+
relevant_docs = self.data_processor.search(user_query, num_results=top_k, method=search_method)
|
11 |
+
|
12 |
+
# Construct the prompt
|
13 |
+
context = "\n".join([doc['content'] for doc in relevant_docs])
|
14 |
+
prompt = f"Context: {context}\n\nQuestion: {user_query}\n\nAnswer:"
|
15 |
+
|
16 |
+
# Generate response using Ollama
|
17 |
+
response = ollama.generate(model=self.model, prompt=prompt)
|
18 |
+
|
19 |
+
return response['response']
|
20 |
+
|
21 |
+
def rerank_documents(self, documents, query):
|
22 |
+
# Implement a simple re-ranking strategy
|
23 |
+
# This could be improved with more sophisticated methods
|
24 |
+
reranked = sorted(documents, key=lambda doc: self.calculate_relevance(doc['content'], query), reverse=True)
|
25 |
+
return reranked
|
26 |
+
|
27 |
+
def calculate_relevance(self, document, query):
|
28 |
+
# Simple relevance calculation based on word overlap
|
29 |
+
doc_words = set(document.lower().split())
|
30 |
+
query_words = set(query.lower().split())
|
31 |
+
return len(doc_words.intersection(query_words)) / len(query_words)
|
app/rag_evaluation.py
ADDED
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import pandas as pd
|
2 |
+
import numpy as np
|
3 |
+
from tqdm import tqdm
|
4 |
+
import json
|
5 |
+
import requests
|
6 |
+
import sqlite3
|
7 |
+
from minsearch import Index
|
8 |
+
|
9 |
+
# Database connection
|
10 |
+
conn = sqlite3.connect('data/sqlite.db')
|
11 |
+
cursor = conn.cursor()
|
12 |
+
|
13 |
+
# Load ground truth data from CSV
|
14 |
+
def load_ground_truth():
|
15 |
+
return pd.read_csv('data/ground-truth-retrieval.csv')
|
16 |
+
|
17 |
+
ground_truth = load_ground_truth()
|
18 |
+
|
19 |
+
# Load transcript data
|
20 |
+
def load_transcripts():
|
21 |
+
cursor.execute("SELECT * FROM transcript_segments")
|
22 |
+
rows = cursor.fetchall()
|
23 |
+
return pd.DataFrame(rows, columns=['segment_id', 'video_id', 'content', 'start_time', 'duration'])
|
24 |
+
|
25 |
+
transcripts = load_transcripts()
|
26 |
+
|
27 |
+
# Create index
|
28 |
+
index = Index(
|
29 |
+
text_fields=['content'],
|
30 |
+
keyword_fields=['video_id', 'segment_id']
|
31 |
+
)
|
32 |
+
index.fit(transcripts.to_dict('records'))
|
33 |
+
|
34 |
+
# RAG flow
|
35 |
+
def search(query):
|
36 |
+
boost = {}
|
37 |
+
results = index.search(
|
38 |
+
query=query,
|
39 |
+
filter_dict={},
|
40 |
+
boost_dict=boost,
|
41 |
+
num_results=10
|
42 |
+
)
|
43 |
+
return results
|
44 |
+
|
45 |
+
prompt_template = """
|
46 |
+
You're an AI assistant for YouTube video transcripts. Answer the QUESTION based on the CONTEXT from our transcript database.
|
47 |
+
Use only the facts from the CONTEXT when answering the QUESTION.
|
48 |
+
|
49 |
+
QUESTION: {question}
|
50 |
+
|
51 |
+
CONTEXT:
|
52 |
+
{context}
|
53 |
+
""".strip()
|
54 |
+
|
55 |
+
def build_prompt(query, search_results):
|
56 |
+
context = "\n\n".join([f"Segment {i+1}: {result['content']}" for i, result in enumerate(search_results)])
|
57 |
+
prompt = prompt_template.format(question=query, context=context).strip()
|
58 |
+
return prompt
|
59 |
+
|
60 |
+
def llm(prompt):
|
61 |
+
response = requests.post('http://localhost:11434/api/generate', json={
|
62 |
+
'model': 'phi',
|
63 |
+
'prompt': prompt
|
64 |
+
})
|
65 |
+
if response.status_code == 200:
|
66 |
+
return response.json()['response']
|
67 |
+
else:
|
68 |
+
print(f"Error: {response.status_code} - {response.text}")
|
69 |
+
return None
|
70 |
+
|
71 |
+
def rag(query):
|
72 |
+
search_results = search(query)
|
73 |
+
prompt = build_prompt(query, search_results)
|
74 |
+
answer = llm(prompt)
|
75 |
+
return answer
|
76 |
+
|
77 |
+
# Evaluation metrics
|
78 |
+
def hit_rate(relevance_total):
|
79 |
+
return sum(any(line) for line in relevance_total) / len(relevance_total)
|
80 |
+
|
81 |
+
def mrr(relevance_total):
|
82 |
+
scores = []
|
83 |
+
for line in relevance_total:
|
84 |
+
for rank, relevant in enumerate(line, 1):
|
85 |
+
if relevant:
|
86 |
+
scores.append(1 / rank)
|
87 |
+
break
|
88 |
+
else:
|
89 |
+
scores.append(0)
|
90 |
+
return sum(scores) / len(scores)
|
91 |
+
|
92 |
+
def evaluate(ground_truth, search_function):
|
93 |
+
relevance_total = []
|
94 |
+
for _, row in tqdm(ground_truth.iterrows(), total=len(ground_truth)):
|
95 |
+
video_id = row['video_id']
|
96 |
+
results = search_function(row['question'])
|
97 |
+
relevance = [d['video_id'] == video_id for d in results]
|
98 |
+
relevance_total.append(relevance)
|
99 |
+
return {
|
100 |
+
'hit_rate': hit_rate(relevance_total),
|
101 |
+
'mrr': mrr(relevance_total),
|
102 |
+
}
|
103 |
+
|
104 |
+
# Parameter optimization
|
105 |
+
param_ranges = {
|
106 |
+
'content': (0.0, 3.0),
|
107 |
+
}
|
108 |
+
|
109 |
+
def simple_optimize(param_ranges, objective_function, n_iterations=10):
|
110 |
+
best_params = None
|
111 |
+
best_score = float('-inf')
|
112 |
+
for _ in range(n_iterations):
|
113 |
+
current_params = {param: np.random.uniform(min_val, max_val)
|
114 |
+
for param, (min_val, max_val) in param_ranges.items()}
|
115 |
+
current_score = objective_function(current_params)
|
116 |
+
if current_score > best_score:
|
117 |
+
best_score = current_score
|
118 |
+
best_params = current_params
|
119 |
+
return best_params, best_score
|
120 |
+
|
121 |
+
def objective(boost_params):
|
122 |
+
def search_function(q):
|
123 |
+
return search(q, boost_params)
|
124 |
+
results = evaluate(ground_truth, search_function)
|
125 |
+
return results['mrr']
|
126 |
+
|
127 |
+
# RAG evaluation
|
128 |
+
prompt2_template = """
|
129 |
+
You are an expert evaluator for a Youtube transcript assistant.
|
130 |
+
Your task is to analyze the relevance of the generated answer to the given question.
|
131 |
+
Based on the relevance of the generated answer, you will classify it
|
132 |
+
as "NON_RELEVANT", "PARTLY_RELEVANT", or "RELEVANT".
|
133 |
+
|
134 |
+
Here is the data for evaluation:
|
135 |
+
|
136 |
+
Question: {question}
|
137 |
+
Generated Answer: {answer_llm}
|
138 |
+
|
139 |
+
Please analyze the content and context of the generated answer in relation to the question
|
140 |
+
and provide your evaluation in parsable JSON without using code blocks:
|
141 |
+
|
142 |
+
{{
|
143 |
+
"Relevance": "NON_RELEVANT" | "PARTLY_RELEVANT" | "RELEVANT",
|
144 |
+
"Explanation": "[Provide a brief explanation for your evaluation]"
|
145 |
+
}}
|
146 |
+
""".strip()
|
147 |
+
|
148 |
+
def evaluate_rag(sample_size=200):
|
149 |
+
sample = ground_truth.sample(n=sample_size, random_state=1)
|
150 |
+
evaluations = []
|
151 |
+
for _, row in tqdm(sample.iterrows(), total=len(sample)):
|
152 |
+
question = row['question']
|
153 |
+
answer_llm = rag(question)
|
154 |
+
prompt = prompt2_template.format(question=question, answer_llm=answer_llm)
|
155 |
+
evaluation = llm(prompt)
|
156 |
+
evaluation = json.loads(evaluation)
|
157 |
+
evaluations.append((row['video_id'], question, answer_llm, evaluation['Relevance'], evaluation['Explanation']))
|
158 |
+
return evaluations
|
159 |
+
|
160 |
+
# Main execution
|
161 |
+
if __name__ == "__main__":
|
162 |
+
print("Evaluating search performance...")
|
163 |
+
search_performance = evaluate(ground_truth, lambda q: search(q['question']))
|
164 |
+
print(f"Search performance: {search_performance}")
|
165 |
+
|
166 |
+
print("\nOptimizing search parameters...")
|
167 |
+
best_params, best_score = simple_optimize(param_ranges, objective, n_iterations=20)
|
168 |
+
print(f"Best parameters: {best_params}")
|
169 |
+
print(f"Best score: {best_score}")
|
170 |
+
|
171 |
+
print("\nEvaluating RAG performance...")
|
172 |
+
rag_evaluations = evaluate_rag(sample_size=200)
|
173 |
+
|
174 |
+
# Store RAG evaluations in the database
|
175 |
+
cursor.execute('''
|
176 |
+
CREATE TABLE IF NOT EXISTS rag_evaluations (
|
177 |
+
video_id TEXT,
|
178 |
+
question TEXT,
|
179 |
+
answer TEXT,
|
180 |
+
relevance TEXT,
|
181 |
+
explanation TEXT
|
182 |
+
)
|
183 |
+
''')
|
184 |
+
cursor.executemany('''
|
185 |
+
INSERT INTO rag_evaluations (video_id, question, answer, relevance, explanation)
|
186 |
+
VALUES (?, ?, ?, ?, ?)
|
187 |
+
''', rag_evaluations)
|
188 |
+
conn.commit()
|
189 |
+
|
190 |
+
print("Evaluation complete. Results stored in the database.")
|
191 |
+
|
192 |
+
# Close the database connection
|
193 |
+
conn.close()
|
app/transcript_extractor.py
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from youtube_transcript_api import YouTubeTranscriptApi
|
2 |
+
from googleapiclient.discovery import build
|
3 |
+
from googleapiclient.errors import HttpError
|
4 |
+
import re
|
5 |
+
import os
|
6 |
+
|
7 |
+
# Replace with your actual API key
|
8 |
+
API_KEY = os.environ.get('YOUTUBE_API_KEY', 'YOUR_API_KEY_HERE')
|
9 |
+
|
10 |
+
youtube = build('youtube', 'v3', developerKey=API_KEY)
|
11 |
+
|
12 |
+
def extract_video_id(url):
|
13 |
+
video_id_match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url)
|
14 |
+
if video_id_match:
|
15 |
+
return video_id_match.group(1)
|
16 |
+
return None
|
17 |
+
|
18 |
+
def get_video_metadata(video_id):
|
19 |
+
try:
|
20 |
+
request = youtube.videos().list(
|
21 |
+
part="snippet,contentDetails,statistics",
|
22 |
+
id=video_id
|
23 |
+
)
|
24 |
+
response = request.execute()
|
25 |
+
|
26 |
+
if 'items' in response and len(response['items']) > 0:
|
27 |
+
video = response['items'][0]
|
28 |
+
snippet = video['snippet']
|
29 |
+
return {
|
30 |
+
'title': snippet['title'],
|
31 |
+
'author': snippet['channelTitle'],
|
32 |
+
'upload_date': snippet['publishedAt'],
|
33 |
+
'view_count': video['statistics']['viewCount'],
|
34 |
+
'like_count': video['statistics'].get('likeCount', 'N/A'),
|
35 |
+
'comment_count': video['statistics'].get('commentCount', 'N/A'),
|
36 |
+
'duration': video['contentDetails']['duration']
|
37 |
+
}
|
38 |
+
else:
|
39 |
+
return None
|
40 |
+
except HttpError as e:
|
41 |
+
print(f"An HTTP error {e.resp.status} occurred: {e.content}")
|
42 |
+
return None
|
43 |
+
|
44 |
+
def get_transcript(video_id):
|
45 |
+
try:
|
46 |
+
transcript = YouTubeTranscriptApi.get_transcript(video_id)
|
47 |
+
metadata = get_video_metadata(video_id)
|
48 |
+
return {
|
49 |
+
'transcript': transcript,
|
50 |
+
'metadata': metadata
|
51 |
+
}
|
52 |
+
except Exception as e:
|
53 |
+
print(f"Error extracting transcript for video {video_id}: {str(e)}")
|
54 |
+
return None
|
55 |
+
|
56 |
+
def get_channel_videos(channel_id):
|
57 |
+
try:
|
58 |
+
request = youtube.search().list(
|
59 |
+
part="id,snippet",
|
60 |
+
channelId=channel_id,
|
61 |
+
type="video",
|
62 |
+
maxResults=50 # Adjust as needed
|
63 |
+
)
|
64 |
+
response = request.execute()
|
65 |
+
|
66 |
+
videos = []
|
67 |
+
for item in response['items']:
|
68 |
+
videos.append({
|
69 |
+
'video_id': item['id']['videoId'],
|
70 |
+
'title': item['snippet']['title'],
|
71 |
+
'description': item['snippet']['description'],
|
72 |
+
'published_at': item['snippet']['publishedAt']
|
73 |
+
})
|
74 |
+
return videos
|
75 |
+
except HttpError as e:
|
76 |
+
print(f"An HTTP error {e.resp.status} occurred: {e.content}")
|
77 |
+
return []
|
78 |
+
|
79 |
+
def process_videos(video_ids):
|
80 |
+
transcripts = {}
|
81 |
+
for video_id in video_ids:
|
82 |
+
transcript_data = get_transcript(video_id)
|
83 |
+
if transcript_data:
|
84 |
+
transcripts[video_id] = transcript_data
|
85 |
+
return transcripts
|
config/config.yaml
ADDED
File without changes
|
data/sqlite.db
ADDED
Binary file (32.8 kB). View file
|
|
docker-compose.yaml
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
version: '3.8'
|
2 |
+
|
3 |
+
services:
|
4 |
+
app:
|
5 |
+
build: .
|
6 |
+
ports:
|
7 |
+
- "8501:8501"
|
8 |
+
depends_on:
|
9 |
+
- elasticsearch
|
10 |
+
environment:
|
11 |
+
- ELASTICSEARCH_HOST=elasticsearch
|
12 |
+
- ELASTICSEARCH_PORT=9200
|
13 |
+
- YOUTUBE_API_KEY=${YOUTUBE_API_KEY}
|
14 |
+
env_file:
|
15 |
+
- .env
|
16 |
+
volumes:
|
17 |
+
- ./data:/app/data
|
18 |
+
- ./config:/app/config
|
19 |
+
|
20 |
+
elasticsearch:
|
21 |
+
image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
|
22 |
+
environment:
|
23 |
+
- discovery.type=single-node
|
24 |
+
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
|
25 |
+
ports:
|
26 |
+
- "9200:9200"
|
27 |
+
volumes:
|
28 |
+
- esdata:/usr/share/elasticsearch/data
|
29 |
+
|
30 |
+
grafana:
|
31 |
+
image: grafana/grafana:latest
|
32 |
+
ports:
|
33 |
+
- "3000:3000"
|
34 |
+
volumes:
|
35 |
+
- grafana-storage:/var/lib/grafana
|
36 |
+
- ./config/grafana:/etc/grafana/provisioning
|
37 |
+
depends_on:
|
38 |
+
- elasticsearch
|
39 |
+
|
40 |
+
volumes:
|
41 |
+
esdata:
|
42 |
+
grafana-storage:
|
grafana/provisioning/dashboards/rag_evaluation.json
ADDED
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"annotations": {
|
3 |
+
"list": [
|
4 |
+
{
|
5 |
+
"builtIn": 1,
|
6 |
+
"datasource": "-- Grafana --",
|
7 |
+
"enable": true,
|
8 |
+
"hide": true,
|
9 |
+
"iconColor": "rgba(0, 211, 255, 1)",
|
10 |
+
"name": "Annotations & Alerts",
|
11 |
+
"type": "dashboard"
|
12 |
+
}
|
13 |
+
]
|
14 |
+
},
|
15 |
+
"editable": true,
|
16 |
+
"gnetId": null,
|
17 |
+
"graphTooltip": 0,
|
18 |
+
"id": 1,
|
19 |
+
"links": [],
|
20 |
+
"panels": [
|
21 |
+
{
|
22 |
+
"aliasColors": {},
|
23 |
+
"bars": false,
|
24 |
+
"dashLength": 10,
|
25 |
+
"dashes": false,
|
26 |
+
"datasource": "SQLite",
|
27 |
+
"fieldConfig": {
|
28 |
+
"defaults": {},
|
29 |
+
"overrides": []
|
30 |
+
},
|
31 |
+
"fill": 1,
|
32 |
+
"fillGradient": 0,
|
33 |
+
"gridPos": {
|
34 |
+
"h": 9,
|
35 |
+
"w": 12,
|
36 |
+
"x": 0,
|
37 |
+
"y": 0
|
38 |
+
},
|
39 |
+
"hiddenSeries": false,
|
40 |
+
"id": 2,
|
41 |
+
"legend": {
|
42 |
+
"avg": false,
|
43 |
+
"current": false,
|
44 |
+
"max": false,
|
45 |
+
"min": false,
|
46 |
+
"show": true,
|
47 |
+
"total": false,
|
48 |
+
"values": false
|
49 |
+
},
|
50 |
+
"lines": true,
|
51 |
+
"linewidth": 1,
|
52 |
+
"nullPointMode": "null",
|
53 |
+
"options": {
|
54 |
+
"alertThreshold": true
|
55 |
+
},
|
56 |
+
"percentage": false,
|
57 |
+
"pluginVersion": "7.5.7",
|
58 |
+
"pointradius": 2,
|
59 |
+
"points": false,
|
60 |
+
"renderer": "flot",
|
61 |
+
"seriesOverrides": [],
|
62 |
+
"spaceLength": 10,
|
63 |
+
"stack": false,
|
64 |
+
"steppedLine": false,
|
65 |
+
"targets": [
|
66 |
+
{
|
67 |
+
"queryType": "table",
|
68 |
+
"refId": "A",
|
69 |
+
"sql": "SELECT relevance, COUNT(*) as count FROM rag_evaluations GROUP BY relevance"
|
70 |
+
}
|
71 |
+
],
|
72 |
+
"thresholds": [],
|
73 |
+
"timeFrom": null,
|
74 |
+
"timeRegions": [],
|
75 |
+
"timeShift": null,
|
76 |
+
"title": "RAG Evaluation Results",
|
77 |
+
"tooltip": {
|
78 |
+
"shared": true,
|
79 |
+
"sort": 0,
|
80 |
+
"value_type": "individual"
|
81 |
+
},
|
82 |
+
"type": "graph",
|
83 |
+
"xaxis": {
|
84 |
+
"buckets": null,
|
85 |
+
"mode": "categories",
|
86 |
+
"name": null,
|
87 |
+
"show": true,
|
88 |
+
"values": []
|
89 |
+
},
|
90 |
+
"yaxes": [
|
91 |
+
{
|
92 |
+
"format": "short",
|
93 |
+
"label": null,
|
94 |
+
"logBase": 1,
|
95 |
+
"max": null,
|
96 |
+
"min": null,
|
97 |
+
"show": true
|
98 |
+
},
|
99 |
+
{
|
100 |
+
"format": "short",
|
101 |
+
"label": null,
|
102 |
+
"logBase": 1,
|
103 |
+
"max": null,
|
104 |
+
"min": null,
|
105 |
+
"show": true
|
106 |
+
}
|
107 |
+
],
|
108 |
+
"yaxis": {
|
109 |
+
"align": false,
|
110 |
+
"alignLevel": null
|
111 |
+
}
|
112 |
+
}
|
113 |
+
],
|
114 |
+
"schemaVersion": 27,
|
115 |
+
"style": "dark",
|
116 |
+
"tags": [],
|
117 |
+
"templating": {
|
118 |
+
"list": []
|
119 |
+
},
|
120 |
+
"time": {
|
121 |
+
"from": "now-6h",
|
122 |
+
"to": "now"
|
123 |
+
},
|
124 |
+
"timepicker": {},
|
125 |
+
"timezone": "",
|
126 |
+
"title": "RAG Evaluation Dashboard",
|
127 |
+
"uid": "rag_evaluation",
|
128 |
+
"version": 1
|
129 |
+
}
|
grafana/provisioning/datasources/sqlite.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
apiVersion: 1
|
2 |
+
|
3 |
+
datasources:
|
4 |
+
- name: SQLite
|
5 |
+
type: sqlite
|
6 |
+
url: /app/data/sqlite.db
|
7 |
+
isDefault: true
|
llmrag/Scripts/Activate.ps1
ADDED
@@ -0,0 +1,472 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<#
|
2 |
+
.Synopsis
|
3 |
+
Activate a Python virtual environment for the current PowerShell session.
|
4 |
+
|
5 |
+
.Description
|
6 |
+
Pushes the python executable for a virtual environment to the front of the
|
7 |
+
$Env:PATH environment variable and sets the prompt to signify that you are
|
8 |
+
in a Python virtual environment. Makes use of the command line switches as
|
9 |
+
well as the `pyvenv.cfg` file values present in the virtual environment.
|
10 |
+
|
11 |
+
.Parameter VenvDir
|
12 |
+
Path to the directory that contains the virtual environment to activate. The
|
13 |
+
default value for this is the parent of the directory that the Activate.ps1
|
14 |
+
script is located within.
|
15 |
+
|
16 |
+
.Parameter Prompt
|
17 |
+
The prompt prefix to display when this virtual environment is activated. By
|
18 |
+
default, this prompt is the name of the virtual environment folder (VenvDir)
|
19 |
+
surrounded by parentheses and followed by a single space (ie. '(.venv) ').
|
20 |
+
|
21 |
+
.Example
|
22 |
+
Activate.ps1
|
23 |
+
Activates the Python virtual environment that contains the Activate.ps1 script.
|
24 |
+
|
25 |
+
.Example
|
26 |
+
Activate.ps1 -Verbose
|
27 |
+
Activates the Python virtual environment that contains the Activate.ps1 script,
|
28 |
+
and shows extra information about the activation as it executes.
|
29 |
+
|
30 |
+
.Example
|
31 |
+
Activate.ps1 -VenvDir C:\Users\MyUser\Common\.venv
|
32 |
+
Activates the Python virtual environment located in the specified location.
|
33 |
+
|
34 |
+
.Example
|
35 |
+
Activate.ps1 -Prompt "MyPython"
|
36 |
+
Activates the Python virtual environment that contains the Activate.ps1 script,
|
37 |
+
and prefixes the current prompt with the specified string (surrounded in
|
38 |
+
parentheses) while the virtual environment is active.
|
39 |
+
|
40 |
+
.Notes
|
41 |
+
On Windows, it may be required to enable this Activate.ps1 script by setting the
|
42 |
+
execution policy for the user. You can do this by issuing the following PowerShell
|
43 |
+
command:
|
44 |
+
|
45 |
+
PS C:\> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
46 |
+
|
47 |
+
For more information on Execution Policies:
|
48 |
+
https://go.microsoft.com/fwlink/?LinkID=135170
|
49 |
+
|
50 |
+
#>
|
51 |
+
Param(
|
52 |
+
[Parameter(Mandatory = $false)]
|
53 |
+
[String]
|
54 |
+
$VenvDir,
|
55 |
+
[Parameter(Mandatory = $false)]
|
56 |
+
[String]
|
57 |
+
$Prompt
|
58 |
+
)
|
59 |
+
|
60 |
+
<# Function declarations --------------------------------------------------- #>
|
61 |
+
|
62 |
+
<#
|
63 |
+
.Synopsis
|
64 |
+
Remove all shell session elements added by the Activate script, including the
|
65 |
+
addition of the virtual environment's Python executable from the beginning of
|
66 |
+
the PATH variable.
|
67 |
+
|
68 |
+
.Parameter NonDestructive
|
69 |
+
If present, do not remove this function from the global namespace for the
|
70 |
+
session.
|
71 |
+
|
72 |
+
#>
|
73 |
+
function global:deactivate ([switch]$NonDestructive) {
|
74 |
+
# Revert to original values
|
75 |
+
|
76 |
+
# The prior prompt:
|
77 |
+
if (Test-Path -Path Function:_OLD_VIRTUAL_PROMPT) {
|
78 |
+
Copy-Item -Path Function:_OLD_VIRTUAL_PROMPT -Destination Function:prompt
|
79 |
+
Remove-Item -Path Function:_OLD_VIRTUAL_PROMPT
|
80 |
+
}
|
81 |
+
|
82 |
+
# The prior PYTHONHOME:
|
83 |
+
if (Test-Path -Path Env:_OLD_VIRTUAL_PYTHONHOME) {
|
84 |
+
Copy-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME -Destination Env:PYTHONHOME
|
85 |
+
Remove-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME
|
86 |
+
}
|
87 |
+
|
88 |
+
# The prior PATH:
|
89 |
+
if (Test-Path -Path Env:_OLD_VIRTUAL_PATH) {
|
90 |
+
Copy-Item -Path Env:_OLD_VIRTUAL_PATH -Destination Env:PATH
|
91 |
+
Remove-Item -Path Env:_OLD_VIRTUAL_PATH
|
92 |
+
}
|
93 |
+
|
94 |
+
# Just remove the VIRTUAL_ENV altogether:
|
95 |
+
if (Test-Path -Path Env:VIRTUAL_ENV) {
|
96 |
+
Remove-Item -Path env:VIRTUAL_ENV
|
97 |
+
}
|
98 |
+
|
99 |
+
# Just remove VIRTUAL_ENV_PROMPT altogether.
|
100 |
+
if (Test-Path -Path Env:VIRTUAL_ENV_PROMPT) {
|
101 |
+
Remove-Item -Path env:VIRTUAL_ENV_PROMPT
|
102 |
+
}
|
103 |
+
|
104 |
+
# Just remove the _PYTHON_VENV_PROMPT_PREFIX altogether:
|
105 |
+
if (Get-Variable -Name "_PYTHON_VENV_PROMPT_PREFIX" -ErrorAction SilentlyContinue) {
|
106 |
+
Remove-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Scope Global -Force
|
107 |
+
}
|
108 |
+
|
109 |
+
# Leave deactivate function in the global namespace if requested:
|
110 |
+
if (-not $NonDestructive) {
|
111 |
+
Remove-Item -Path function:deactivate
|
112 |
+
}
|
113 |
+
}
|
114 |
+
|
115 |
+
<#
|
116 |
+
.Description
|
117 |
+
Get-PyVenvConfig parses the values from the pyvenv.cfg file located in the
|
118 |
+
given folder, and returns them in a map.
|
119 |
+
|
120 |
+
For each line in the pyvenv.cfg file, if that line can be parsed into exactly
|
121 |
+
two strings separated by `=` (with any amount of whitespace surrounding the =)
|
122 |
+
then it is considered a `key = value` line. The left hand string is the key,
|
123 |
+
the right hand is the value.
|
124 |
+
|
125 |
+
If the value starts with a `'` or a `"` then the first and last character is
|
126 |
+
stripped from the value before being captured.
|
127 |
+
|
128 |
+
.Parameter ConfigDir
|
129 |
+
Path to the directory that contains the `pyvenv.cfg` file.
|
130 |
+
#>
|
131 |
+
function Get-PyVenvConfig(
|
132 |
+
[String]
|
133 |
+
$ConfigDir
|
134 |
+
) {
|
135 |
+
Write-Verbose "Given ConfigDir=$ConfigDir, obtain values in pyvenv.cfg"
|
136 |
+
|
137 |
+
# Ensure the file exists, and issue a warning if it doesn't (but still allow the function to continue).
|
138 |
+
$pyvenvConfigPath = Join-Path -Resolve -Path $ConfigDir -ChildPath 'pyvenv.cfg' -ErrorAction Continue
|
139 |
+
|
140 |
+
# An empty map will be returned if no config file is found.
|
141 |
+
$pyvenvConfig = @{ }
|
142 |
+
|
143 |
+
if ($pyvenvConfigPath) {
|
144 |
+
|
145 |
+
Write-Verbose "File exists, parse `key = value` lines"
|
146 |
+
$pyvenvConfigContent = Get-Content -Path $pyvenvConfigPath
|
147 |
+
|
148 |
+
$pyvenvConfigContent | ForEach-Object {
|
149 |
+
$keyval = $PSItem -split "\s*=\s*", 2
|
150 |
+
if ($keyval[0] -and $keyval[1]) {
|
151 |
+
$val = $keyval[1]
|
152 |
+
|
153 |
+
# Remove extraneous quotations around a string value.
|
154 |
+
if ("'""".Contains($val.Substring(0, 1))) {
|
155 |
+
$val = $val.Substring(1, $val.Length - 2)
|
156 |
+
}
|
157 |
+
|
158 |
+
$pyvenvConfig[$keyval[0]] = $val
|
159 |
+
Write-Verbose "Adding Key: '$($keyval[0])'='$val'"
|
160 |
+
}
|
161 |
+
}
|
162 |
+
}
|
163 |
+
return $pyvenvConfig
|
164 |
+
}
|
165 |
+
|
166 |
+
|
167 |
+
<# Begin Activate script --------------------------------------------------- #>
|
168 |
+
|
169 |
+
# Determine the containing directory of this script
|
170 |
+
$VenvExecPath = Split-Path -Parent $MyInvocation.MyCommand.Definition
|
171 |
+
$VenvExecDir = Get-Item -Path $VenvExecPath
|
172 |
+
|
173 |
+
Write-Verbose "Activation script is located in path: '$VenvExecPath'"
|
174 |
+
Write-Verbose "VenvExecDir Fullname: '$($VenvExecDir.FullName)"
|
175 |
+
Write-Verbose "VenvExecDir Name: '$($VenvExecDir.Name)"
|
176 |
+
|
177 |
+
# Set values required in priority: CmdLine, ConfigFile, Default
|
178 |
+
# First, get the location of the virtual environment, it might not be
|
179 |
+
# VenvExecDir if specified on the command line.
|
180 |
+
if ($VenvDir) {
|
181 |
+
Write-Verbose "VenvDir given as parameter, using '$VenvDir' to determine values"
|
182 |
+
}
|
183 |
+
else {
|
184 |
+
Write-Verbose "VenvDir not given as a parameter, using parent directory name as VenvDir."
|
185 |
+
$VenvDir = $VenvExecDir.Parent.FullName.TrimEnd("\\/")
|
186 |
+
Write-Verbose "VenvDir=$VenvDir"
|
187 |
+
}
|
188 |
+
|
189 |
+
# Next, read the `pyvenv.cfg` file to determine any required value such
|
190 |
+
# as `prompt`.
|
191 |
+
$pyvenvCfg = Get-PyVenvConfig -ConfigDir $VenvDir
|
192 |
+
|
193 |
+
# Next, set the prompt from the command line, or the config file, or
|
194 |
+
# just use the name of the virtual environment folder.
|
195 |
+
if ($Prompt) {
|
196 |
+
Write-Verbose "Prompt specified as argument, using '$Prompt'"
|
197 |
+
}
|
198 |
+
else {
|
199 |
+
Write-Verbose "Prompt not specified as argument to script, checking pyvenv.cfg value"
|
200 |
+
if ($pyvenvCfg -and $pyvenvCfg['prompt']) {
|
201 |
+
Write-Verbose " Setting based on value in pyvenv.cfg='$($pyvenvCfg['prompt'])'"
|
202 |
+
$Prompt = $pyvenvCfg['prompt'];
|
203 |
+
}
|
204 |
+
else {
|
205 |
+
Write-Verbose " Setting prompt based on parent's directory's name. (Is the directory name passed to venv module when creating the virtual environment)"
|
206 |
+
Write-Verbose " Got leaf-name of $VenvDir='$(Split-Path -Path $venvDir -Leaf)'"
|
207 |
+
$Prompt = Split-Path -Path $venvDir -Leaf
|
208 |
+
}
|
209 |
+
}
|
210 |
+
|
211 |
+
Write-Verbose "Prompt = '$Prompt'"
|
212 |
+
Write-Verbose "VenvDir='$VenvDir'"
|
213 |
+
|
214 |
+
# Deactivate any currently active virtual environment, but leave the
|
215 |
+
# deactivate function in place.
|
216 |
+
deactivate -nondestructive
|
217 |
+
|
218 |
+
# Now set the environment variable VIRTUAL_ENV, used by many tools to determine
|
219 |
+
# that there is an activated venv.
|
220 |
+
$env:VIRTUAL_ENV = $VenvDir
|
221 |
+
|
222 |
+
if (-not $Env:VIRTUAL_ENV_DISABLE_PROMPT) {
|
223 |
+
|
224 |
+
Write-Verbose "Setting prompt to '$Prompt'"
|
225 |
+
|
226 |
+
# Set the prompt to include the env name
|
227 |
+
# Make sure _OLD_VIRTUAL_PROMPT is global
|
228 |
+
function global:_OLD_VIRTUAL_PROMPT { "" }
|
229 |
+
Copy-Item -Path function:prompt -Destination function:_OLD_VIRTUAL_PROMPT
|
230 |
+
New-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Description "Python virtual environment prompt prefix" -Scope Global -Option ReadOnly -Visibility Public -Value $Prompt
|
231 |
+
|
232 |
+
function global:prompt {
|
233 |
+
Write-Host -NoNewline -ForegroundColor Green "($_PYTHON_VENV_PROMPT_PREFIX) "
|
234 |
+
_OLD_VIRTUAL_PROMPT
|
235 |
+
}
|
236 |
+
$env:VIRTUAL_ENV_PROMPT = $Prompt
|
237 |
+
}
|
238 |
+
|
239 |
+
# Clear PYTHONHOME
|
240 |
+
if (Test-Path -Path Env:PYTHONHOME) {
|
241 |
+
Copy-Item -Path Env:PYTHONHOME -Destination Env:_OLD_VIRTUAL_PYTHONHOME
|
242 |
+
Remove-Item -Path Env:PYTHONHOME
|
243 |
+
}
|
244 |
+
|
245 |
+
# Add the venv to the PATH
|
246 |
+
Copy-Item -Path Env:PATH -Destination Env:_OLD_VIRTUAL_PATH
|
247 |
+
$Env:PATH = "$VenvExecDir$([System.IO.Path]::PathSeparator)$Env:PATH"
|
248 |
+
|
249 |
+
# SIG # Begin signature block
|
250 |
+
# MIIpigYJKoZIhvcNAQcCoIIpezCCKXcCAQExDzANBglghkgBZQMEAgEFADB5Bgor
|
251 |
+
# BgEEAYI3AgEEoGswaTA0BgorBgEEAYI3AgEeMCYCAwEAAAQQH8w7YFlLCE63JNLG
|
252 |
+
# KX7zUQIBAAIBAAIBAAIBAAIBADAxMA0GCWCGSAFlAwQCAQUABCBnL745ElCYk8vk
|
253 |
+
# dBtMuQhLeWJ3ZGfzKW4DHCYzAn+QB6CCDi8wggawMIIEmKADAgECAhAIrUCyYNKc
|
254 |
+
# TJ9ezam9k67ZMA0GCSqGSIb3DQEBDAUAMGIxCzAJBgNVBAYTAlVTMRUwEwYDVQQK
|
255 |
+
# EwxEaWdpQ2VydCBJbmMxGTAXBgNVBAsTEHd3dy5kaWdpY2VydC5jb20xITAfBgNV
|
256 |
+
# BAMTGERpZ2lDZXJ0IFRydXN0ZWQgUm9vdCBHNDAeFw0yMTA0MjkwMDAwMDBaFw0z
|
257 |
+
# NjA0MjgyMzU5NTlaMGkxCzAJBgNVBAYTAlVTMRcwFQYDVQQKEw5EaWdpQ2VydCwg
|
258 |
+
# SW5jLjFBMD8GA1UEAxM4RGlnaUNlcnQgVHJ1c3RlZCBHNCBDb2RlIFNpZ25pbmcg
|
259 |
+
# UlNBNDA5NiBTSEEzODQgMjAyMSBDQTEwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAw
|
260 |
+
# ggIKAoICAQDVtC9C0CiteLdd1TlZG7GIQvUzjOs9gZdwxbvEhSYwn6SOaNhc9es0
|
261 |
+
# JAfhS0/TeEP0F9ce2vnS1WcaUk8OoVf8iJnBkcyBAz5NcCRks43iCH00fUyAVxJr
|
262 |
+
# Q5qZ8sU7H/Lvy0daE6ZMswEgJfMQ04uy+wjwiuCdCcBlp/qYgEk1hz1RGeiQIXhF
|
263 |
+
# LqGfLOEYwhrMxe6TSXBCMo/7xuoc82VokaJNTIIRSFJo3hC9FFdd6BgTZcV/sk+F
|
264 |
+
# LEikVoQ11vkunKoAFdE3/hoGlMJ8yOobMubKwvSnowMOdKWvObarYBLj6Na59zHh
|
265 |
+
# 3K3kGKDYwSNHR7OhD26jq22YBoMbt2pnLdK9RBqSEIGPsDsJ18ebMlrC/2pgVItJ
|
266 |
+
# wZPt4bRc4G/rJvmM1bL5OBDm6s6R9b7T+2+TYTRcvJNFKIM2KmYoX7BzzosmJQay
|
267 |
+
# g9Rc9hUZTO1i4F4z8ujo7AqnsAMrkbI2eb73rQgedaZlzLvjSFDzd5Ea/ttQokbI
|
268 |
+
# YViY9XwCFjyDKK05huzUtw1T0PhH5nUwjewwk3YUpltLXXRhTT8SkXbev1jLchAp
|
269 |
+
# QfDVxW0mdmgRQRNYmtwmKwH0iU1Z23jPgUo+QEdfyYFQc4UQIyFZYIpkVMHMIRro
|
270 |
+
# OBl8ZhzNeDhFMJlP/2NPTLuqDQhTQXxYPUez+rbsjDIJAsxsPAxWEQIDAQABo4IB
|
271 |
+
# WTCCAVUwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHQ4EFgQUaDfg67Y7+F8Rhvv+
|
272 |
+
# YXsIiGX0TkIwHwYDVR0jBBgwFoAU7NfjgtJxXWRM3y5nP+e6mK4cD08wDgYDVR0P
|
273 |
+
# AQH/BAQDAgGGMBMGA1UdJQQMMAoGCCsGAQUFBwMDMHcGCCsGAQUFBwEBBGswaTAk
|
274 |
+
# BggrBgEFBQcwAYYYaHR0cDovL29jc3AuZGlnaWNlcnQuY29tMEEGCCsGAQUFBzAC
|
275 |
+
# hjVodHRwOi8vY2FjZXJ0cy5kaWdpY2VydC5jb20vRGlnaUNlcnRUcnVzdGVkUm9v
|
276 |
+
# dEc0LmNydDBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsMy5kaWdpY2VydC5j
|
277 |
+
# b20vRGlnaUNlcnRUcnVzdGVkUm9vdEc0LmNybDAcBgNVHSAEFTATMAcGBWeBDAED
|
278 |
+
# MAgGBmeBDAEEATANBgkqhkiG9w0BAQwFAAOCAgEAOiNEPY0Idu6PvDqZ01bgAhql
|
279 |
+
# +Eg08yy25nRm95RysQDKr2wwJxMSnpBEn0v9nqN8JtU3vDpdSG2V1T9J9Ce7FoFF
|
280 |
+
# UP2cvbaF4HZ+N3HLIvdaqpDP9ZNq4+sg0dVQeYiaiorBtr2hSBh+3NiAGhEZGM1h
|
281 |
+
# mYFW9snjdufE5BtfQ/g+lP92OT2e1JnPSt0o618moZVYSNUa/tcnP/2Q0XaG3Ryw
|
282 |
+
# YFzzDaju4ImhvTnhOE7abrs2nfvlIVNaw8rpavGiPttDuDPITzgUkpn13c5Ubdld
|
283 |
+
# AhQfQDN8A+KVssIhdXNSy0bYxDQcoqVLjc1vdjcshT8azibpGL6QB7BDf5WIIIJw
|
284 |
+
# 8MzK7/0pNVwfiThV9zeKiwmhywvpMRr/LhlcOXHhvpynCgbWJme3kuZOX956rEnP
|
285 |
+
# LqR0kq3bPKSchh/jwVYbKyP/j7XqiHtwa+aguv06P0WmxOgWkVKLQcBIhEuWTatE
|
286 |
+
# QOON8BUozu3xGFYHKi8QxAwIZDwzj64ojDzLj4gLDb879M4ee47vtevLt/B3E+bn
|
287 |
+
# KD+sEq6lLyJsQfmCXBVmzGwOysWGw/YmMwwHS6DTBwJqakAwSEs0qFEgu60bhQji
|
288 |
+
# WQ1tygVQK+pKHJ6l/aCnHwZ05/LWUpD9r4VIIflXO7ScA+2GRfS0YW6/aOImYIbq
|
289 |
+
# yK+p/pQd52MbOoZWeE4wggd3MIIFX6ADAgECAhAHHxQbizANJfMU6yMM0NHdMA0G
|
290 |
+
# CSqGSIb3DQEBCwUAMGkxCzAJBgNVBAYTAlVTMRcwFQYDVQQKEw5EaWdpQ2VydCwg
|
291 |
+
# SW5jLjFBMD8GA1UEAxM4RGlnaUNlcnQgVHJ1c3RlZCBHNCBDb2RlIFNpZ25pbmcg
|
292 |
+
# UlNBNDA5NiBTSEEzODQgMjAyMSBDQTEwHhcNMjIwMTE3MDAwMDAwWhcNMjUwMTE1
|
293 |
+
# MjM1OTU5WjB8MQswCQYDVQQGEwJVUzEPMA0GA1UECBMGT3JlZ29uMRIwEAYDVQQH
|
294 |
+
# EwlCZWF2ZXJ0b24xIzAhBgNVBAoTGlB5dGhvbiBTb2Z0d2FyZSBGb3VuZGF0aW9u
|
295 |
+
# MSMwIQYDVQQDExpQeXRob24gU29mdHdhcmUgRm91bmRhdGlvbjCCAiIwDQYJKoZI
|
296 |
+
# hvcNAQEBBQADggIPADCCAgoCggIBAKgc0BTT+iKbtK6f2mr9pNMUTcAJxKdsuOiS
|
297 |
+
# YgDFfwhjQy89koM7uP+QV/gwx8MzEt3c9tLJvDccVWQ8H7mVsk/K+X+IufBLCgUi
|
298 |
+
# 0GGAZUegEAeRlSXxxhYScr818ma8EvGIZdiSOhqjYc4KnfgfIS4RLtZSrDFG2tN1
|
299 |
+
# 6yS8skFa3IHyvWdbD9PvZ4iYNAS4pjYDRjT/9uzPZ4Pan+53xZIcDgjiTwOh8VGu
|
300 |
+
# ppxcia6a7xCyKoOAGjvCyQsj5223v1/Ig7Dp9mGI+nh1E3IwmyTIIuVHyK6Lqu35
|
301 |
+
# 2diDY+iCMpk9ZanmSjmB+GMVs+H/gOiofjjtf6oz0ki3rb7sQ8fTnonIL9dyGTJ0
|
302 |
+
# ZFYKeb6BLA66d2GALwxZhLe5WH4Np9HcyXHACkppsE6ynYjTOd7+jN1PRJahN1oE
|
303 |
+
# RzTzEiV6nCO1M3U1HbPTGyq52IMFSBM2/07WTJSbOeXjvYR7aUxK9/ZkJiacl2iZ
|
304 |
+
# I7IWe7JKhHohqKuceQNyOzxTakLcRkzynvIrk33R9YVqtB4L6wtFxhUjvDnQg16x
|
305 |
+
# ot2KVPdfyPAWd81wtZADmrUtsZ9qG79x1hBdyOl4vUtVPECuyhCxaw+faVjumapP
|
306 |
+
# Unwo8ygflJJ74J+BYxf6UuD7m8yzsfXWkdv52DjL74TxzuFTLHPyARWCSCAbzn3Z
|
307 |
+
# Ily+qIqDAgMBAAGjggIGMIICAjAfBgNVHSMEGDAWgBRoN+Drtjv4XxGG+/5hewiI
|
308 |
+
# ZfROQjAdBgNVHQ4EFgQUt/1Teh2XDuUj2WW3siYWJgkZHA8wDgYDVR0PAQH/BAQD
|
309 |
+
# AgeAMBMGA1UdJQQMMAoGCCsGAQUFBwMDMIG1BgNVHR8Ega0wgaowU6BRoE+GTWh0
|
310 |
+
# dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNENvZGVTaWdu
|
311 |
+
# aW5nUlNBNDA5NlNIQTM4NDIwMjFDQTEuY3JsMFOgUaBPhk1odHRwOi8vY3JsNC5k
|
312 |
+
# aWdpY2VydC5jb20vRGlnaUNlcnRUcnVzdGVkRzRDb2RlU2lnbmluZ1JTQTQwOTZT
|
313 |
+
# SEEzODQyMDIxQ0ExLmNybDA+BgNVHSAENzA1MDMGBmeBDAEEATApMCcGCCsGAQUF
|
314 |
+
# BwIBFhtodHRwOi8vd3d3LmRpZ2ljZXJ0LmNvbS9DUFMwgZQGCCsGAQUFBwEBBIGH
|
315 |
+
# MIGEMCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wXAYIKwYB
|
316 |
+
# BQUHMAKGUGh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0
|
317 |
+
# ZWRHNENvZGVTaWduaW5nUlNBNDA5NlNIQTM4NDIwMjFDQTEuY3J0MAwGA1UdEwEB
|
318 |
+
# /wQCMAAwDQYJKoZIhvcNAQELBQADggIBABxv4AeV/5ltkELHSC63fXAFYS5tadcW
|
319 |
+
# TiNc2rskrNLrfH1Ns0vgSZFoQxYBFKI159E8oQQ1SKbTEubZ/B9kmHPhprHya08+
|
320 |
+
# VVzxC88pOEvz68nA82oEM09584aILqYmj8Pj7h/kmZNzuEL7WiwFa/U1hX+XiWfL
|
321 |
+
# IJQsAHBla0i7QRF2de8/VSF0XXFa2kBQ6aiTsiLyKPNbaNtbcucaUdn6vVUS5izW
|
322 |
+
# OXM95BSkFSKdE45Oq3FForNJXjBvSCpwcP36WklaHL+aHu1upIhCTUkzTHMh8b86
|
323 |
+
# WmjRUqbrnvdyR2ydI5l1OqcMBjkpPpIV6wcc+KY/RH2xvVuuoHjlUjwq2bHiNoX+
|
324 |
+
# W1scCpnA8YTs2d50jDHUgwUo+ciwpffH0Riq132NFmrH3r67VaN3TuBxjI8SIZM5
|
325 |
+
# 8WEDkbeoriDk3hxU8ZWV7b8AW6oyVBGfM06UgkfMb58h+tJPrFx8VI/WLq1dTqMf
|
326 |
+
# ZOm5cuclMnUHs2uqrRNtnV8UfidPBL4ZHkTcClQbCoz0UbLhkiDvIS00Dn+BBcxw
|
327 |
+
# /TKqVL4Oaz3bkMSsM46LciTeucHY9ExRVt3zy7i149sd+F4QozPqn7FrSVHXmem3
|
328 |
+
# r7bjyHTxOgqxRCVa18Vtx7P/8bYSBeS+WHCKcliFCecspusCDSlnRUjZwyPdP0VH
|
329 |
+
# xaZg2unjHY3rMYIasTCCGq0CAQEwfTBpMQswCQYDVQQGEwJVUzEXMBUGA1UEChMO
|
330 |
+
# RGlnaUNlcnQsIEluYy4xQTA/BgNVBAMTOERpZ2lDZXJ0IFRydXN0ZWQgRzQgQ29k
|
331 |
+
# ZSBTaWduaW5nIFJTQTQwOTYgU0hBMzg0IDIwMjEgQ0ExAhAHHxQbizANJfMU6yMM
|
332 |
+
# 0NHdMA0GCWCGSAFlAwQCAQUAoIHEMBkGCSqGSIb3DQEJAzEMBgorBgEEAYI3AgEE
|
333 |
+
# MBwGCisGAQQBgjcCAQsxDjAMBgorBgEEAYI3AgEVMC8GCSqGSIb3DQEJBDEiBCBn
|
334 |
+
# AZ6P7YvTwq0fbF62o7E75R0LxsW5OtyYiFESQckLhjBYBgorBgEEAYI3AgEMMUow
|
335 |
+
# SKBGgEQAQgB1AGkAbAB0ADoAIABSAGUAbABlAGEAcwBlAF8AdgAzAC4AMQAxAC4A
|
336 |
+
# MABfADIAMAAyADIAMQAwADIANAAuADAAMTANBgkqhkiG9w0BAQEFAASCAgAu2uG5
|
337 |
+
# zPAAKY4N8BVMzMPRSoTqq2HAcX+oqvto72DGzHLKlfAuuyf59saf7TQZQ04Ao1ni
|
338 |
+
# EvpzZ8C4Wv7yu8RyPwJQThIuFQuhMgB+Zscl+YDnAo5+GFTBpevgcG2n2ClHAPuT
|
339 |
+
# 7aXe3+5wChDpMqyusrBYws+8R6tg8rKFyRhQndxIJkIMlZhoh1qI3tRypW6e2r5l
|
340 |
+
# Uf4pPDkNBBySzjNOupTyv1/d2Y31Ise8xLrLbuMLYxtir/5A0z6GlUueoecpe9TS
|
341 |
+
# uEqz2bI+HZbGC6xK2BU4vW8s7qefVTmPFAf3JiCjZZ46qFAg9jnWCRzAA/3jOtu6
|
342 |
+
# V345rFhCRJxPKz4M96B5mUCnMU0BB4cHJFKZfezd5phtExi1///WcnKNkpNTto+d
|
343 |
+
# etpWbJ87DibBro3ZhDPh9FpHW2jxy2IQBZo02Udbwfd7aoKhRf7MCLqZUIziPjRS
|
344 |
+
# FcA1hyOzYk4XfHK1qW3Wpflduz86UGDbURWP3XhXQNaSScJGOhVylZbiBWcjFKlD
|
345 |
+
# E/sl+bDyafUy0jLur6/Vl4H2xCgXbJlEazr04QfizW9N9x2G6sDkdbQd4k3kSEJt
|
346 |
+
# UOufbrdjDY1MRd/NlnjVGY+zslEDN9QJQuKq00SJagicDJ+vIzg6J7YjnRfDGLAi
|
347 |
+
# RJb9rXxuQyEoSTdtxQgnPNkb6vCNQz80bjHmoqGCFz4wghc6BgorBgEEAYI3AwMB
|
348 |
+
# MYIXKjCCFyYGCSqGSIb3DQEHAqCCFxcwghcTAgEDMQ8wDQYJYIZIAWUDBAIBBQAw
|
349 |
+
# eAYLKoZIhvcNAQkQAQSgaQRnMGUCAQEGCWCGSAGG/WwHATAxMA0GCWCGSAFlAwQC
|
350 |
+
# AQUABCCJnxONky4RAgM+R4O2F+soqJ9cjrZDLL3JqXN+msPWngIRAPgphjs42egI
|
351 |
+
# Fn/RXf6+TgkYDzIwMjIxMDI0MTgzMzM4WqCCEwcwggbAMIIEqKADAgECAhAMTWly
|
352 |
+
# S5T6PCpKPSkHgD1aMA0GCSqGSIb3DQEBCwUAMGMxCzAJBgNVBAYTAlVTMRcwFQYD
|
353 |
+
# VQQKEw5EaWdpQ2VydCwgSW5jLjE7MDkGA1UEAxMyRGlnaUNlcnQgVHJ1c3RlZCBH
|
354 |
+
# NCBSU0E0MDk2IFNIQTI1NiBUaW1lU3RhbXBpbmcgQ0EwHhcNMjIwOTIxMDAwMDAw
|
355 |
+
# WhcNMzMxMTIxMjM1OTU5WjBGMQswCQYDVQQGEwJVUzERMA8GA1UEChMIRGlnaUNl
|
356 |
+
# cnQxJDAiBgNVBAMTG0RpZ2lDZXJ0IFRpbWVzdGFtcCAyMDIyIC0gMjCCAiIwDQYJ
|
357 |
+
# KoZIhvcNAQEBBQADggIPADCCAgoCggIBAM/spSY6xqnya7uNwQ2a26HoFIV0Mxom
|
358 |
+
# rNAcVR4eNm28klUMYfSdCXc9FZYIL2tkpP0GgxbXkZI4HDEClvtysZc6Va8z7GGK
|
359 |
+
# 6aYo25BjXL2JU+A6LYyHQq4mpOS7eHi5ehbhVsbAumRTuyoW51BIu4hpDIjG8b7g
|
360 |
+
# L307scpTjUCDHufLckkoHkyAHoVW54Xt8mG8qjoHffarbuVm3eJc9S/tjdRNlYRo
|
361 |
+
# 44DLannR0hCRRinrPibytIzNTLlmyLuqUDgN5YyUXRlav/V7QG5vFqianJVHhoV5
|
362 |
+
# PgxeZowaCiS+nKrSnLb3T254xCg/oxwPUAY3ugjZNaa1Htp4WB056PhMkRCWfk3h
|
363 |
+
# 3cKtpX74LRsf7CtGGKMZ9jn39cFPcS6JAxGiS7uYv/pP5Hs27wZE5FX/NurlfDHn
|
364 |
+
# 88JSxOYWe1p+pSVz28BqmSEtY+VZ9U0vkB8nt9KrFOU4ZodRCGv7U0M50GT6Vs/g
|
365 |
+
# 9ArmFG1keLuY/ZTDcyHzL8IuINeBrNPxB9ThvdldS24xlCmL5kGkZZTAWOXlLimQ
|
366 |
+
# prdhZPrZIGwYUWC6poEPCSVT8b876asHDmoHOWIZydaFfxPZjXnPYsXs4Xu5zGcT
|
367 |
+
# B5rBeO3GiMiwbjJ5xwtZg43G7vUsfHuOy2SJ8bHEuOdTXl9V0n0ZKVkDTvpd6kVz
|
368 |
+
# HIR+187i1Dp3AgMBAAGjggGLMIIBhzAOBgNVHQ8BAf8EBAMCB4AwDAYDVR0TAQH/
|
369 |
+
# BAIwADAWBgNVHSUBAf8EDDAKBggrBgEFBQcDCDAgBgNVHSAEGTAXMAgGBmeBDAEE
|
370 |
+
# AjALBglghkgBhv1sBwEwHwYDVR0jBBgwFoAUuhbZbU2FL3MpdpovdYxqII+eyG8w
|
371 |
+
# HQYDVR0OBBYEFGKK3tBh/I8xFO2XC809KpQU31KcMFoGA1UdHwRTMFEwT6BNoEuG
|
372 |
+
# SWh0dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNFJTQTQw
|
373 |
+
# OTZTSEEyNTZUaW1lU3RhbXBpbmdDQS5jcmwwgZAGCCsGAQUFBwEBBIGDMIGAMCQG
|
374 |
+
# CCsGAQUFBzABhhhodHRwOi8vb2NzcC5kaWdpY2VydC5jb20wWAYIKwYBBQUHMAKG
|
375 |
+
# TGh0dHA6Ly9jYWNlcnRzLmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRydXN0ZWRHNFJT
|
376 |
+
# QTQwOTZTSEEyNTZUaW1lU3RhbXBpbmdDQS5jcnQwDQYJKoZIhvcNAQELBQADggIB
|
377 |
+
# AFWqKhrzRvN4Vzcw/HXjT9aFI/H8+ZU5myXm93KKmMN31GT8Ffs2wklRLHiIY1UJ
|
378 |
+
# RjkA/GnUypsp+6M/wMkAmxMdsJiJ3HjyzXyFzVOdr2LiYWajFCpFh0qYQitQ/Bu1
|
379 |
+
# nggwCfrkLdcJiXn5CeaIzn0buGqim8FTYAnoo7id160fHLjsmEHw9g6A++T/350Q
|
380 |
+
# p+sAul9Kjxo6UrTqvwlJFTU2WZoPVNKyG39+XgmtdlSKdG3K0gVnK3br/5iyJpU4
|
381 |
+
# GYhEFOUKWaJr5yI+RCHSPxzAm+18SLLYkgyRTzxmlK9dAlPrnuKe5NMfhgFknADC
|
382 |
+
# 6Vp0dQ094XmIvxwBl8kZI4DXNlpflhaxYwzGRkA7zl011Fk+Q5oYrsPJy8P7mxNf
|
383 |
+
# arXH4PMFw1nfJ2Ir3kHJU7n/NBBn9iYymHv+XEKUgZSCnawKi8ZLFUrTmJBFYDOA
|
384 |
+
# 4CPe+AOk9kVH5c64A0JH6EE2cXet/aLol3ROLtoeHYxayB6a1cLwxiKoT5u92Bya
|
385 |
+
# UcQvmvZfpyeXupYuhVfAYOd4Vn9q78KVmksRAsiCnMkaBXy6cbVOepls9Oie1FqY
|
386 |
+
# yJ+/jbsYXEP10Cro4mLueATbvdH7WwqocH7wl4R44wgDXUcsY6glOJcB0j862uXl
|
387 |
+
# 9uab3H4szP8XTE0AotjWAQ64i+7m4HJViSwnGWH2dwGMMIIGrjCCBJagAwIBAgIQ
|
388 |
+
# BzY3tyRUfNhHrP0oZipeWzANBgkqhkiG9w0BAQsFADBiMQswCQYDVQQGEwJVUzEV
|
389 |
+
# MBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3d3cuZGlnaWNlcnQuY29t
|
390 |
+
# MSEwHwYDVQQDExhEaWdpQ2VydCBUcnVzdGVkIFJvb3QgRzQwHhcNMjIwMzIzMDAw
|
391 |
+
# MDAwWhcNMzcwMzIyMjM1OTU5WjBjMQswCQYDVQQGEwJVUzEXMBUGA1UEChMORGln
|
392 |
+
# aUNlcnQsIEluYy4xOzA5BgNVBAMTMkRpZ2lDZXJ0IFRydXN0ZWQgRzQgUlNBNDA5
|
393 |
+
# NiBTSEEyNTYgVGltZVN0YW1waW5nIENBMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
|
394 |
+
# MIICCgKCAgEAxoY1BkmzwT1ySVFVxyUDxPKRN6mXUaHW0oPRnkyibaCwzIP5WvYR
|
395 |
+
# oUQVQl+kiPNo+n3znIkLf50fng8zH1ATCyZzlm34V6gCff1DtITaEfFzsbPuK4CE
|
396 |
+
# iiIY3+vaPcQXf6sZKz5C3GeO6lE98NZW1OcoLevTsbV15x8GZY2UKdPZ7Gnf2ZCH
|
397 |
+
# RgB720RBidx8ald68Dd5n12sy+iEZLRS8nZH92GDGd1ftFQLIWhuNyG7QKxfst5K
|
398 |
+
# fc71ORJn7w6lY2zkpsUdzTYNXNXmG6jBZHRAp8ByxbpOH7G1WE15/tePc5OsLDni
|
399 |
+
# pUjW8LAxE6lXKZYnLvWHpo9OdhVVJnCYJn+gGkcgQ+NDY4B7dW4nJZCYOjgRs/b2
|
400 |
+
# nuY7W+yB3iIU2YIqx5K/oN7jPqJz+ucfWmyU8lKVEStYdEAoq3NDzt9KoRxrOMUp
|
401 |
+
# 88qqlnNCaJ+2RrOdOqPVA+C/8KI8ykLcGEh/FDTP0kyr75s9/g64ZCr6dSgkQe1C
|
402 |
+
# vwWcZklSUPRR8zZJTYsg0ixXNXkrqPNFYLwjjVj33GHek/45wPmyMKVM1+mYSlg+
|
403 |
+
# 0wOI/rOP015LdhJRk8mMDDtbiiKowSYI+RQQEgN9XyO7ZONj4KbhPvbCdLI/Hgl2
|
404 |
+
# 7KtdRnXiYKNYCQEoAA6EVO7O6V3IXjASvUaetdN2udIOa5kM0jO0zbECAwEAAaOC
|
405 |
+
# AV0wggFZMBIGA1UdEwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFLoW2W1NhS9zKXaa
|
406 |
+
# L3WMaiCPnshvMB8GA1UdIwQYMBaAFOzX44LScV1kTN8uZz/nupiuHA9PMA4GA1Ud
|
407 |
+
# DwEB/wQEAwIBhjATBgNVHSUEDDAKBggrBgEFBQcDCDB3BggrBgEFBQcBAQRrMGkw
|
408 |
+
# JAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLmRpZ2ljZXJ0LmNvbTBBBggrBgEFBQcw
|
409 |
+
# AoY1aHR0cDovL2NhY2VydHMuZGlnaWNlcnQuY29tL0RpZ2lDZXJ0VHJ1c3RlZFJv
|
410 |
+
# b3RHNC5jcnQwQwYDVR0fBDwwOjA4oDagNIYyaHR0cDovL2NybDMuZGlnaWNlcnQu
|
411 |
+
# Y29tL0RpZ2lDZXJ0VHJ1c3RlZFJvb3RHNC5jcmwwIAYDVR0gBBkwFzAIBgZngQwB
|
412 |
+
# BAIwCwYJYIZIAYb9bAcBMA0GCSqGSIb3DQEBCwUAA4ICAQB9WY7Ak7ZvmKlEIgF+
|
413 |
+
# ZtbYIULhsBguEE0TzzBTzr8Y+8dQXeJLKftwig2qKWn8acHPHQfpPmDI2AvlXFvX
|
414 |
+
# bYf6hCAlNDFnzbYSlm/EUExiHQwIgqgWvalWzxVzjQEiJc6VaT9Hd/tydBTX/6tP
|
415 |
+
# iix6q4XNQ1/tYLaqT5Fmniye4Iqs5f2MvGQmh2ySvZ180HAKfO+ovHVPulr3qRCy
|
416 |
+
# Xen/KFSJ8NWKcXZl2szwcqMj+sAngkSumScbqyQeJsG33irr9p6xeZmBo1aGqwpF
|
417 |
+
# yd/EjaDnmPv7pp1yr8THwcFqcdnGE4AJxLafzYeHJLtPo0m5d2aR8XKc6UsCUqc3
|
418 |
+
# fpNTrDsdCEkPlM05et3/JWOZJyw9P2un8WbDQc1PtkCbISFA0LcTJM3cHXg65J6t
|
419 |
+
# 5TRxktcma+Q4c6umAU+9Pzt4rUyt+8SVe+0KXzM5h0F4ejjpnOHdI/0dKNPH+ejx
|
420 |
+
# mF/7K9h+8kaddSweJywm228Vex4Ziza4k9Tm8heZWcpw8De/mADfIBZPJ/tgZxah
|
421 |
+
# ZrrdVcA6KYawmKAr7ZVBtzrVFZgxtGIJDwq9gdkT/r+k0fNX2bwE+oLeMt8EifAA
|
422 |
+
# zV3C+dAjfwAL5HYCJtnwZXZCpimHCUcr5n8apIUP/JiW9lVUKx+A+sDyDivl1vup
|
423 |
+
# L0QVSucTDh3bNzgaoSv27dZ8/DCCBY0wggR1oAMCAQICEA6bGI750C3n79tQ4ghA
|
424 |
+
# GFowDQYJKoZIhvcNAQEMBQAwZTELMAkGA1UEBhMCVVMxFTATBgNVBAoTDERpZ2lD
|
425 |
+
# ZXJ0IEluYzEZMBcGA1UECxMQd3d3LmRpZ2ljZXJ0LmNvbTEkMCIGA1UEAxMbRGln
|
426 |
+
# aUNlcnQgQXNzdXJlZCBJRCBSb290IENBMB4XDTIyMDgwMTAwMDAwMFoXDTMxMTEw
|
427 |
+
# OTIzNTk1OVowYjELMAkGA1UEBhMCVVMxFTATBgNVBAoTDERpZ2lDZXJ0IEluYzEZ
|
428 |
+
# MBcGA1UECxMQd3d3LmRpZ2ljZXJ0LmNvbTEhMB8GA1UEAxMYRGlnaUNlcnQgVHJ1
|
429 |
+
# c3RlZCBSb290IEc0MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAv+aQ
|
430 |
+
# c2jeu+RdSjwwIjBpM+zCpyUuySE98orYWcLhKac9WKt2ms2uexuEDcQwH/MbpDgW
|
431 |
+
# 61bGl20dq7J58soR0uRf1gU8Ug9SH8aeFaV+vp+pVxZZVXKvaJNwwrK6dZlqczKU
|
432 |
+
# 0RBEEC7fgvMHhOZ0O21x4i0MG+4g1ckgHWMpLc7sXk7Ik/ghYZs06wXGXuxbGrzr
|
433 |
+
# yc/NrDRAX7F6Zu53yEioZldXn1RYjgwrt0+nMNlW7sp7XeOtyU9e5TXnMcvak17c
|
434 |
+
# jo+A2raRmECQecN4x7axxLVqGDgDEI3Y1DekLgV9iPWCPhCRcKtVgkEy19sEcypu
|
435 |
+
# kQF8IUzUvK4bA3VdeGbZOjFEmjNAvwjXWkmkwuapoGfdpCe8oU85tRFYF/ckXEaP
|
436 |
+
# ZPfBaYh2mHY9WV1CdoeJl2l6SPDgohIbZpp0yt5LHucOY67m1O+SkjqePdwA5EUl
|
437 |
+
# ibaaRBkrfsCUtNJhbesz2cXfSwQAzH0clcOP9yGyshG3u3/y1YxwLEFgqrFjGESV
|
438 |
+
# GnZifvaAsPvoZKYz0YkH4b235kOkGLimdwHhD5QMIR2yVCkliWzlDlJRR3S+Jqy2
|
439 |
+
# QXXeeqxfjT/JvNNBERJb5RBQ6zHFynIWIgnffEx1P2PsIV/EIFFrb7GrhotPwtZF
|
440 |
+
# X50g/KEexcCPorF+CiaZ9eRpL5gdLfXZqbId5RsCAwEAAaOCATowggE2MA8GA1Ud
|
441 |
+
# EwEB/wQFMAMBAf8wHQYDVR0OBBYEFOzX44LScV1kTN8uZz/nupiuHA9PMB8GA1Ud
|
442 |
+
# IwQYMBaAFEXroq/0ksuCMS1Ri6enIZ3zbcgPMA4GA1UdDwEB/wQEAwIBhjB5Bggr
|
443 |
+
# BgEFBQcBAQRtMGswJAYIKwYBBQUHMAGGGGh0dHA6Ly9vY3NwLmRpZ2ljZXJ0LmNv
|
444 |
+
# bTBDBggrBgEFBQcwAoY3aHR0cDovL2NhY2VydHMuZGlnaWNlcnQuY29tL0RpZ2lD
|
445 |
+
# ZXJ0QXNzdXJlZElEUm9vdENBLmNydDBFBgNVHR8EPjA8MDqgOKA2hjRodHRwOi8v
|
446 |
+
# Y3JsMy5kaWdpY2VydC5jb20vRGlnaUNlcnRBc3N1cmVkSURSb290Q0EuY3JsMBEG
|
447 |
+
# A1UdIAQKMAgwBgYEVR0gADANBgkqhkiG9w0BAQwFAAOCAQEAcKC/Q1xV5zhfoKN0
|
448 |
+
# Gz22Ftf3v1cHvZqsoYcs7IVeqRq7IviHGmlUIu2kiHdtvRoU9BNKei8ttzjv9P+A
|
449 |
+
# ufih9/Jy3iS8UgPITtAq3votVs/59PesMHqai7Je1M/RQ0SbQyHrlnKhSLSZy51P
|
450 |
+
# pwYDE3cnRNTnf+hZqPC/Lwum6fI0POz3A8eHqNJMQBk1RmppVLC4oVaO7KTVPeix
|
451 |
+
# 3P0c2PR3WlxUjG/voVA9/HYJaISfb8rbII01YBwCA8sgsKxYoA5AY8WYIsGyWfVV
|
452 |
+
# a88nq2x2zm8jLfR+cWojayL/ErhULSd+2DrZ8LaHlv1b0VysGMNNn3O3AamfV6pe
|
453 |
+
# KOK5lDGCA3YwggNyAgEBMHcwYzELMAkGA1UEBhMCVVMxFzAVBgNVBAoTDkRpZ2lD
|
454 |
+
# ZXJ0LCBJbmMuMTswOQYDVQQDEzJEaWdpQ2VydCBUcnVzdGVkIEc0IFJTQTQwOTYg
|
455 |
+
# U0hBMjU2IFRpbWVTdGFtcGluZyBDQQIQDE1pckuU+jwqSj0pB4A9WjANBglghkgB
|
456 |
+
# ZQMEAgEFAKCB0TAaBgkqhkiG9w0BCQMxDQYLKoZIhvcNAQkQAQQwHAYJKoZIhvcN
|
457 |
+
# AQkFMQ8XDTIyMTAyNDE4MzMzOFowKwYLKoZIhvcNAQkQAgwxHDAaMBgwFgQU84ci
|
458 |
+
# TYYzgpI1qZS8vY+W6f4cfHMwLwYJKoZIhvcNAQkEMSIEILoHmtH34MMtLSezOEUS
|
459 |
+
# 8z6MwtqV/PFPq/sNVq5aJnKMMDcGCyqGSIb3DQEJEAIvMSgwJjAkMCIEIMf04b4y
|
460 |
+
# KIkgq+ImOr4axPxP5ngcLWTQTIB1V6Ajtbb6MA0GCSqGSIb3DQEBAQUABIICAEtb
|
461 |
+
# WINxaVTjBdclvuFwJT/uHWvlOdcKzc1o+toRkFb1OA7shEdXFvjNU549TilTs8qQ
|
462 |
+
# bly8CbFcz3JzLVLrNKO7lr4GXd2iyJV5sv/XU4ED866fznOnFWtZJvxKGOdqN0W7
|
463 |
+
# 01pw7mIJ8+2aRqpow1ppPzju7VagRQ8fKmtj9Sg5N8Ja3+AehpjwM/PYzctan/1m
|
464 |
+
# ytIK/HCw5k/MeGmPVBs/fqbN0DT4KGrJ7YMySdYZMs0U9V7Ak7PelZLgw8BkNi1Y
|
465 |
+
# Rb9i+7/t9AaBlVYMy/6+gzdsnarnlSzV8/6Est8w4Ie7sBxx3Tpsokopb+oPF///
|
466 |
+
# 2cA3jMNToO9YfsqvgpTEkWwjWanC2cd26K8ikw0uu0klmaxNvYpP459/QU3JMyFj
|
467 |
+
# I4ReTxVXLZrQlzCDUUdLmLSeV1AugCOYOHM2RAv4r+3qxk0jBCfA8RRK+prLNjXE
|
468 |
+
# af1QEbeRRNr0418MtnBdIzxHnW8yffWfHmtDNJoyPqggkRU3Mb8Myu8QPD3ZiCPj
|
469 |
+
# F+HsKUntyCV64hr9BNLmkpbw+kUvGtC0/7sZF9Gyp/DKnnbQu8vSR+CaZQqVQxJo
|
470 |
+
# UeI7m44utNTSSZCJ9JV7bnniwqztrP/r2PTAxkUywoCzif6R863qJ/uQA0QQjq8t
|
471 |
+
# +aR822g6YVyJsLYQKbpEgshG2QwzGHun5HkvawJ8
|
472 |
+
# SIG # End signature block
|
llmrag/Scripts/activate
ADDED
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# This file must be used with "source bin/activate" *from bash*
|
2 |
+
# you cannot run it directly
|
3 |
+
|
4 |
+
deactivate () {
|
5 |
+
# reset old environment variables
|
6 |
+
if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
|
7 |
+
PATH="${_OLD_VIRTUAL_PATH:-}"
|
8 |
+
export PATH
|
9 |
+
unset _OLD_VIRTUAL_PATH
|
10 |
+
fi
|
11 |
+
if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
|
12 |
+
PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
|
13 |
+
export PYTHONHOME
|
14 |
+
unset _OLD_VIRTUAL_PYTHONHOME
|
15 |
+
fi
|
16 |
+
|
17 |
+
# This should detect bash and zsh, which have a hash command that must
|
18 |
+
# be called to get it to forget past commands. Without forgetting
|
19 |
+
# past commands the $PATH changes we made may not be respected
|
20 |
+
if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
|
21 |
+
hash -r 2> /dev/null
|
22 |
+
fi
|
23 |
+
|
24 |
+
if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
|
25 |
+
PS1="${_OLD_VIRTUAL_PS1:-}"
|
26 |
+
export PS1
|
27 |
+
unset _OLD_VIRTUAL_PS1
|
28 |
+
fi
|
29 |
+
|
30 |
+
unset VIRTUAL_ENV
|
31 |
+
unset VIRTUAL_ENV_PROMPT
|
32 |
+
if [ ! "${1:-}" = "nondestructive" ] ; then
|
33 |
+
# Self destruct!
|
34 |
+
unset -f deactivate
|
35 |
+
fi
|
36 |
+
}
|
37 |
+
|
38 |
+
# unset irrelevant variables
|
39 |
+
deactivate nondestructive
|
40 |
+
|
41 |
+
VIRTUAL_ENV="D:\llm-chatbot\rag-youtube-assistant\llmrag"
|
42 |
+
export VIRTUAL_ENV
|
43 |
+
|
44 |
+
_OLD_VIRTUAL_PATH="$PATH"
|
45 |
+
PATH="$VIRTUAL_ENV/Scripts:$PATH"
|
46 |
+
export PATH
|
47 |
+
|
48 |
+
# unset PYTHONHOME if set
|
49 |
+
# this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
|
50 |
+
# could use `if (set -u; : $PYTHONHOME) ;` in bash
|
51 |
+
if [ -n "${PYTHONHOME:-}" ] ; then
|
52 |
+
_OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
|
53 |
+
unset PYTHONHOME
|
54 |
+
fi
|
55 |
+
|
56 |
+
if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
|
57 |
+
_OLD_VIRTUAL_PS1="${PS1:-}"
|
58 |
+
PS1="(llmrag) ${PS1:-}"
|
59 |
+
export PS1
|
60 |
+
VIRTUAL_ENV_PROMPT="(llmrag) "
|
61 |
+
export VIRTUAL_ENV_PROMPT
|
62 |
+
fi
|
63 |
+
|
64 |
+
# This should detect bash and zsh, which have a hash command that must
|
65 |
+
# be called to get it to forget past commands. Without forgetting
|
66 |
+
# past commands the $PATH changes we made may not be respected
|
67 |
+
if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
|
68 |
+
hash -r 2> /dev/null
|
69 |
+
fi
|
llmrag/Scripts/activate.bat
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
@echo off
|
2 |
+
|
3 |
+
rem This file is UTF-8 encoded, so we need to update the current code page while executing it
|
4 |
+
for /f "tokens=2 delims=:." %%a in ('"%SystemRoot%\System32\chcp.com"') do (
|
5 |
+
set _OLD_CODEPAGE=%%a
|
6 |
+
)
|
7 |
+
if defined _OLD_CODEPAGE (
|
8 |
+
"%SystemRoot%\System32\chcp.com" 65001 > nul
|
9 |
+
)
|
10 |
+
|
11 |
+
set VIRTUAL_ENV=D:\llm-chatbot\rag-youtube-assistant\llmrag
|
12 |
+
|
13 |
+
if not defined PROMPT set PROMPT=$P$G
|
14 |
+
|
15 |
+
if defined _OLD_VIRTUAL_PROMPT set PROMPT=%_OLD_VIRTUAL_PROMPT%
|
16 |
+
if defined _OLD_VIRTUAL_PYTHONHOME set PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%
|
17 |
+
|
18 |
+
set _OLD_VIRTUAL_PROMPT=%PROMPT%
|
19 |
+
set PROMPT=(llmrag) %PROMPT%
|
20 |
+
|
21 |
+
if defined PYTHONHOME set _OLD_VIRTUAL_PYTHONHOME=%PYTHONHOME%
|
22 |
+
set PYTHONHOME=
|
23 |
+
|
24 |
+
if defined _OLD_VIRTUAL_PATH set PATH=%_OLD_VIRTUAL_PATH%
|
25 |
+
if not defined _OLD_VIRTUAL_PATH set _OLD_VIRTUAL_PATH=%PATH%
|
26 |
+
|
27 |
+
set PATH=%VIRTUAL_ENV%\Scripts;%PATH%
|
28 |
+
set VIRTUAL_ENV_PROMPT=(llmrag)
|
29 |
+
|
30 |
+
:END
|
31 |
+
if defined _OLD_CODEPAGE (
|
32 |
+
"%SystemRoot%\System32\chcp.com" %_OLD_CODEPAGE% > nul
|
33 |
+
set _OLD_CODEPAGE=
|
34 |
+
)
|
llmrag/Scripts/deactivate.bat
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
@echo off
|
2 |
+
|
3 |
+
if defined _OLD_VIRTUAL_PROMPT (
|
4 |
+
set "PROMPT=%_OLD_VIRTUAL_PROMPT%"
|
5 |
+
)
|
6 |
+
set _OLD_VIRTUAL_PROMPT=
|
7 |
+
|
8 |
+
if defined _OLD_VIRTUAL_PYTHONHOME (
|
9 |
+
set "PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%"
|
10 |
+
set _OLD_VIRTUAL_PYTHONHOME=
|
11 |
+
)
|
12 |
+
|
13 |
+
if defined _OLD_VIRTUAL_PATH (
|
14 |
+
set "PATH=%_OLD_VIRTUAL_PATH%"
|
15 |
+
)
|
16 |
+
|
17 |
+
set _OLD_VIRTUAL_PATH=
|
18 |
+
|
19 |
+
set VIRTUAL_ENV=
|
20 |
+
set VIRTUAL_ENV_PROMPT=
|
21 |
+
|
22 |
+
:END
|
llmrag/Scripts/pip.exe
ADDED
Binary file (108 kB). View file
|
|
llmrag/Scripts/pip3.10.exe
ADDED
Binary file (108 kB). View file
|
|
llmrag/Scripts/pip3.11.exe
ADDED
Binary file (108 kB). View file
|
|
llmrag/Scripts/pip3.exe
ADDED
Binary file (108 kB). View file
|
|
llmrag/Scripts/python.exe
ADDED
Binary file (268 kB). View file
|
|
llmrag/Scripts/pythonw.exe
ADDED
Binary file (256 kB). View file
|
|
llmrag/pyvenv.cfg
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
home = C:\Python311
|
2 |
+
include-system-site-packages = false
|
3 |
+
version = 3.11.0
|
4 |
+
executable = C:\Python311\python.exe
|
5 |
+
command = C:\Python311\python.exe -m venv D:\llm-chatbot\rag-youtube-assistant\llmrag
|
requirements.txt
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
streamlit
|
2 |
+
youtube_transcript_api
|
3 |
+
sentence-transformers
|
4 |
+
google-api-python-client
|
5 |
+
google-auth-httplib2
|
6 |
+
google-auth-oauthlib
|
7 |
+
pandas
|
8 |
+
numpy
|
9 |
+
scikit-learn
|
10 |
+
elasticsearch
|
11 |
+
ollama
|
12 |
+
requests
|
13 |
+
matplotlib
|
14 |
+
tqdm
|
run-docker-compose-windows.ps1
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Define the path to the .env file
|
2 |
+
$envPath = ".\.env"
|
3 |
+
|
4 |
+
# Check if the .env file exists
|
5 |
+
if (Test-Path $envPath) {
|
6 |
+
# Read the .env file
|
7 |
+
$envContent = Get-Content $envPath
|
8 |
+
|
9 |
+
# Parse the environment variables
|
10 |
+
foreach ($line in $envContent) {
|
11 |
+
if ($line -match '^([^=]+)=(.*)$') {
|
12 |
+
$name = $matches[1]
|
13 |
+
$value = $matches[2]
|
14 |
+
[Environment]::SetEnvironmentVariable($name, $value, "Process")
|
15 |
+
}
|
16 |
+
}
|
17 |
+
|
18 |
+
# Run docker-compose
|
19 |
+
docker-compose up --build
|
20 |
+
}
|
21 |
+
else {
|
22 |
+
Write-Error "The .env file was not found at $envPath"
|
23 |
+
}
|
run-docker-compose.sh
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
|
3 |
+
# Start Ollama
|
4 |
+
ollama serve &
|
5 |
+
|
6 |
+
# Wait for Ollama to start
|
7 |
+
sleep 10
|
8 |
+
|
9 |
+
# Run Phi model to ensure it's loaded
|
10 |
+
ollama run phi "hello" &
|
11 |
+
|
12 |
+
# Generate ground truth
|
13 |
+
python generate_ground_truth.py
|
14 |
+
|
15 |
+
# Run RAG evaluation
|
16 |
+
python rag_evaluation.py
|
17 |
+
|
18 |
+
# Start the Streamlit app
|
19 |
+
streamlit run main.py
|