Spaces:
Paused
Paused
Add initial database setup and comprehensive documentation for SmartQuery application; include Dockerfile for containerized deployment
Browse files- Dockerfile +32 -0
- README.md +80 -0
- chainlit.md +33 -1
- database/Chinook_Sqlite.sql +0 -0
- old_app.py +0 -24
- smartquery/__init__.py +0 -0
- app.py β smartquery/app.py +9 -47
- smartquery/init_db.py +13 -0
- prompt_templates.py β smartquery/prompt_templates.py +0 -0
- sql_agent.py β smartquery/sql_agent.py +2 -2
Dockerfile
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Use the official Python image from the Docker Hub
|
2 |
+
FROM python:3.11
|
3 |
+
|
4 |
+
# Create a new user with a specific UID and set it as the default user
|
5 |
+
RUN useradd -m -u 1000 user
|
6 |
+
|
7 |
+
# Switch to the new user
|
8 |
+
USER user
|
9 |
+
|
10 |
+
# Set environment variables
|
11 |
+
ENV HOME=/home/user \
|
12 |
+
PATH=/home/user/.local/bin:$PATH
|
13 |
+
|
14 |
+
# Set the working directory
|
15 |
+
WORKDIR $HOME/app
|
16 |
+
|
17 |
+
# Copy the requirements file and install the dependencies
|
18 |
+
COPY --chown=user ./requirements.txt $HOME/app/requirements.txt
|
19 |
+
RUN pip install --upgrade pip && \
|
20 |
+
pip install -r requirements.txt
|
21 |
+
|
22 |
+
# Copy project files to the working directory
|
23 |
+
COPY --chown=user . $HOME/app
|
24 |
+
|
25 |
+
# Set PYTHONPATH to include the project directory
|
26 |
+
ENV PYTHONPATH=$HOME/app
|
27 |
+
|
28 |
+
# Initialize the database
|
29 |
+
RUN python smartquery/init_db.py
|
30 |
+
|
31 |
+
# Set the command to run the application
|
32 |
+
CMD ["chainlit", "run", "smartquery/app.py", "--port", "7860"]
|
README.md
CHANGED
@@ -4,3 +4,83 @@ colorFrom: blue
|
|
4 |
colorTo: yellow
|
5 |
sdk: docker
|
6 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
colorTo: yellow
|
5 |
sdk: docker
|
6 |
pinned: false
|
7 |
+
|
8 |
+
# SmartQuery
|
9 |
+
|
10 |
+
SmartQuery is an intelligent assistant designed to provide seamless interaction with your database. Built on top of LangChain and Chainlit, and using the OpenAI API, SmartQuery allows users to query their database using natural language, either through text or voice commands.
|
11 |
+
|
12 |
+
## Features
|
13 |
+
|
14 |
+
- **Natural Language Querying:** Interact with your database using plain English, no SQL required.
|
15 |
+
- **Voice Commands:** Ask questions out loud and get verbal responses.
|
16 |
+
- **Rich Insights:** Get detailed answers and insights from your data.
|
17 |
+
- **User-Friendly Interface:** Simple chat-based interaction for ease of use.
|
18 |
+
|
19 |
+
## Setup
|
20 |
+
|
21 |
+
Clone the repository:
|
22 |
+
|
23 |
+
```bash
|
24 |
+
git clone https://github.com/your-repo/SmartQuery.git
|
25 |
+
cd SmartQuery
|
26 |
+
```
|
27 |
+
|
28 |
+
Install the required dependencies using pip:
|
29 |
+
|
30 |
+
```bash
|
31 |
+
pip install -r requirements.txt
|
32 |
+
```
|
33 |
+
|
34 |
+
Create and populate the database:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
sqlite3 database/Chinook.db
|
38 |
+
.read database/Chinook_Sqlite.sql
|
39 |
+
.exit
|
40 |
+
```
|
41 |
+
|
42 |
+
Create a .env file and add your environment variables:
|
43 |
+
|
44 |
+
```bash
|
45 |
+
OPENAI_API_KEY=your-openai-key-here
|
46 |
+
ELEVENLABS_API_KEY=your-elevenlabs-key-here
|
47 |
+
ELEVENLABS_VOICE_ID=your-elevenlabs-voice-id-here
|
48 |
+
```
|
49 |
+
|
50 |
+
Run the application using the following command:
|
51 |
+
|
52 |
+
```bash
|
53 |
+
chainlit run app.py
|
54 |
+
```
|
55 |
+
|
56 |
+
## Usage
|
57 |
+
|
58 |
+
Start a chat session and ask questions related to the content of the Chinook database. The application supports both text and voice input. For voice input, press the microphone button, speak your question, and let SmartQuery process your query. It might take some time to answer complex questions (usually less than 1 minute), so please be patient.
|
59 |
+
|
60 |
+
### Sample Questions to Try:
|
61 |
+
|
62 |
+
- "What is the most expensive track?"
|
63 |
+
- "List the total sales per country. Which country's customers spent the most?"
|
64 |
+
- "Who are the 3 most listened artists and what is their average revenue?"
|
65 |
+
|
66 |
+
## About the Database
|
67 |
+
|
68 |
+
The database powering SmartQuery is the Chinook Database, a sample database representing a digital media store. It contains tables for:
|
69 |
+
|
70 |
+
- **Artists**: Information about music artists.
|
71 |
+
- **Albums**: Details of albums released by artists.
|
72 |
+
- **Tracks**: Data on individual tracks, including their length and price.
|
73 |
+
- **Genres**: Different genres of music.
|
74 |
+
- **Customers**: Information about the customers of the store.
|
75 |
+
- **Invoices**: Purchase records containing information on sales transactions.
|
76 |
+
- **InvoiceLines**: Details about each item in an invoice.
|
77 |
+
- **Employees**: Data on employees managing the store.
|
78 |
+
- **Playlists**: User-generated playlists.
|
79 |
+
- **PlaylistTracks**: Mapping of tracks to playlists.
|
80 |
+
- **MediaTypes**: Types of media the tracks are available in.
|
81 |
+
|
82 |
+
This structure allows you to ask a wide range of questions about sales, customer preferences, artist performance, and more. Feel free to explore the richness of the data and uncover valuable insights.
|
83 |
+
|
84 |
+
## Acknowledgements
|
85 |
+
|
86 |
+
This project uses technologies including LangChain, OpenAI's GPT models, FAISS for vector storage, Eleven Labs for speech-to-text and text-to-speech, and ChainLit for building interactive AI applications. Thanks to all open-source contributors and organizations that make these tools available.
|
chainlit.md
CHANGED
@@ -4,12 +4,44 @@ Hello and welcome to SmartQuery, your intelligent assistant designed to help you
|
|
4 |
|
5 |
## Getting Started
|
6 |
|
7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
## How It Works π§
|
10 |
|
|
|
|
|
11 |
## The Tech Behind It π‘π€
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## Ready to Query?
|
14 |
|
15 |
With SmartQuery, transform your database interactions into a seamless experience. Dive into your data, uncover insights, and make data-driven decisions more effectively than ever before. Happy querying!
|
|
|
4 |
|
5 |
## Getting Started
|
6 |
|
7 |
+
### Sample Questions to Try:
|
8 |
+
|
9 |
+
- "What is the most expensive track?"
|
10 |
+
- "List the total sales per country. Which country's customers spent the most?"
|
11 |
+
- "Who are the 3 most listened artists and what is their average revenue?"
|
12 |
+
|
13 |
+
## About the Database πΆπ½
|
14 |
+
|
15 |
+
The database powering SmartQuery is the Chinook Database, a sample database representing a digital media store. It contains tables for:
|
16 |
+
|
17 |
+
- **Artists**: Information about music artists.
|
18 |
+
- **Albums**: Details of albums released by artists.
|
19 |
+
- **Tracks**: Data on individual tracks, including their length and price.
|
20 |
+
- **Genres**: Different genres of music.
|
21 |
+
- **Customers**: Information about the customers of the store.
|
22 |
+
- **Invoices**: Purchase records containing information on sales transactions.
|
23 |
+
- **InvoiceLines**: Details about each item in an invoice.
|
24 |
+
- **Employees**: Data on employees managing the store.
|
25 |
+
- **Playlists**: User-generated playlists.
|
26 |
+
- **PlaylistTracks**: Mapping of tracks to playlists.
|
27 |
+
- **MediaTypes**: Types of media the tracks are available in.
|
28 |
+
|
29 |
+
This structure allows you to ask a wide range of questions about sales, customer preferences, artist performance, and more. Feel free to explore the richness of the data and uncover valuable insights.
|
30 |
|
31 |
## How It Works π§
|
32 |
|
33 |
+
SmartQuery is an agentic application that leverages advanced natural language processing (NLP) techniques to understand your queries in plain English and convert them into SQL queries to retrieve the desired information from the database. You can interact with SmartQuery via text or voice, making it incredibly versatile and user-friendly.
|
34 |
+
|
35 |
## The Tech Behind It π‘π€
|
36 |
|
37 |
+
SmartQuery is built using state-of-the-art technologies, including:
|
38 |
+
|
39 |
+
- **LangChain**: For chaining together multiple language model prompts and responses.
|
40 |
+
- **OpenAI GPT-4o**: For understanding and processing natural language queries.
|
41 |
+
- **FAISS**: For efficient similarity search and retrieval.
|
42 |
+
- **Chainlit**: For building interactive AI applications front-end.
|
43 |
+
- **Eleven Labs**: For speech-to-text and text-to-speech functionalities, enabling voice interactions.
|
44 |
+
|
45 |
## Ready to Query?
|
46 |
|
47 |
With SmartQuery, transform your database interactions into a seamless experience. Dive into your data, uncover insights, and make data-driven decisions more effectively than ever before. Happy querying!
|
database/Chinook_Sqlite.sql
ADDED
The diff for this file is too large to render.
See raw diff
|
|
old_app.py
DELETED
@@ -1,24 +0,0 @@
|
|
1 |
-
import chainlit as cl
|
2 |
-
from langchain.schema.runnable.config import RunnableConfig
|
3 |
-
from sql_agent import SQLAgent
|
4 |
-
|
5 |
-
# ChainLit Integration
|
6 |
-
@cl.on_chat_start
|
7 |
-
async def on_chat_start():
|
8 |
-
cl.user_session.set("agent", SQLAgent)
|
9 |
-
|
10 |
-
@cl.on_message
|
11 |
-
async def on_message(message: cl.Message):
|
12 |
-
agent = cl.user_session.get("agent") # Get the agent from the session
|
13 |
-
cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
|
14 |
-
config = RunnableConfig(callbacks=[cb])
|
15 |
-
|
16 |
-
async with cl.Step(name="SmartQuery Agent", root=True) as step:
|
17 |
-
step.input = message.content
|
18 |
-
result = await agent.ainvoke(message.content, config=config)
|
19 |
-
|
20 |
-
# Assuming the result is a dictionary with a key 'output' containing the final answer
|
21 |
-
final_answer = result.get('output', 'No answer returned')
|
22 |
-
|
23 |
-
# Stream the final answer as a token to the step
|
24 |
-
await step.stream_token(final_answer)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
smartquery/__init__.py
ADDED
File without changes
|
app.py β smartquery/app.py
RENAMED
@@ -4,7 +4,7 @@ import chainlit as cl
|
|
4 |
import httpx
|
5 |
from dotenv import load_dotenv
|
6 |
from langchain.schema.runnable.config import RunnableConfig
|
7 |
-
from sql_agent import SQLAgent
|
8 |
from openai import AsyncOpenAI
|
9 |
from chainlit.element import Audio
|
10 |
|
@@ -29,47 +29,13 @@ async def speech_to_text(audio_file):
|
|
29 |
|
30 |
@cl.step(type="tool")
|
31 |
async def generate_text_answer(transcription, images):
|
32 |
-
model = "gpt-
|
33 |
messages = [{"role": "user", "content": transcription}]
|
34 |
response = await client.chat.completions.create(
|
35 |
messages=messages, model=model, temperature=0.3
|
36 |
)
|
37 |
return response.choices[0].message.content
|
38 |
|
39 |
-
@cl.step(type="tool")
|
40 |
-
async def text_to_speech(text: str, mime_type: str):
|
41 |
-
CHUNK_SIZE = 1024
|
42 |
-
url = f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}"
|
43 |
-
|
44 |
-
headers = {
|
45 |
-
"Accept": mime_type,
|
46 |
-
"Content-Type": "application/json",
|
47 |
-
"xi-api-key": ELEVENLABS_API_KEY
|
48 |
-
}
|
49 |
-
|
50 |
-
data = {
|
51 |
-
"text": text,
|
52 |
-
"model_id": "eleven_monolingual_v1",
|
53 |
-
"voice_settings": {
|
54 |
-
"stability": 0.5,
|
55 |
-
"similarity_boost": 0.5
|
56 |
-
}
|
57 |
-
}
|
58 |
-
|
59 |
-
async with httpx.AsyncClient(timeout=25.0) as client:
|
60 |
-
response = await client.post(url, json=data, headers=headers)
|
61 |
-
response.raise_for_status() # Ensure we notice bad responses
|
62 |
-
|
63 |
-
buffer = BytesIO()
|
64 |
-
buffer.name = f"output_audio.{mime_type.split('/')[1]}"
|
65 |
-
|
66 |
-
async for chunk in response.aiter_bytes(chunk_size=CHUNK_SIZE):
|
67 |
-
if chunk:
|
68 |
-
buffer.write(chunk)
|
69 |
-
|
70 |
-
buffer.seek(0)
|
71 |
-
return buffer.name, buffer.read()
|
72 |
-
|
73 |
@cl.on_chat_start
|
74 |
async def on_chat_start():
|
75 |
cl.user_session.set("agent", SQLAgent)
|
@@ -114,6 +80,11 @@ async def on_audio_end(elements: list[Audio]):
|
|
114 |
|
115 |
await process_message(transcription, answer_message, audio_mime_type)
|
116 |
|
|
|
|
|
|
|
|
|
|
|
117 |
async def process_message(content: str, answer_message=None, mime_type=None):
|
118 |
agent = cl.user_session.get("agent")
|
119 |
cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
|
@@ -127,15 +98,6 @@ async def process_message(content: str, answer_message=None, mime_type=None):
|
|
127 |
|
128 |
await step.stream_token(final_answer)
|
129 |
|
130 |
-
if
|
131 |
-
|
132 |
-
output_audio_el = Audio(
|
133 |
-
name=output_name,
|
134 |
-
auto_play=True,
|
135 |
-
mime=mime_type,
|
136 |
-
content=output_audio,
|
137 |
-
)
|
138 |
-
answer_message.elements = [output_audio_el]
|
139 |
await answer_message.update()
|
140 |
-
else:
|
141 |
-
await cl.Message(content=final_answer).send()
|
|
|
4 |
import httpx
|
5 |
from dotenv import load_dotenv
|
6 |
from langchain.schema.runnable.config import RunnableConfig
|
7 |
+
from smartquery.sql_agent import SQLAgent
|
8 |
from openai import AsyncOpenAI
|
9 |
from chainlit.element import Audio
|
10 |
|
|
|
29 |
|
30 |
@cl.step(type="tool")
|
31 |
async def generate_text_answer(transcription, images):
|
32 |
+
model = "gpt-4o"
|
33 |
messages = [{"role": "user", "content": transcription}]
|
34 |
response = await client.chat.completions.create(
|
35 |
messages=messages, model=model, temperature=0.3
|
36 |
)
|
37 |
return response.choices[0].message.content
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
@cl.on_chat_start
|
40 |
async def on_chat_start():
|
41 |
cl.user_session.set("agent", SQLAgent)
|
|
|
80 |
|
81 |
await process_message(transcription, answer_message, audio_mime_type)
|
82 |
|
83 |
+
# Reset audio buffer and mime type
|
84 |
+
cl.user_session.set("audio_buffer", None)
|
85 |
+
cl.user_session.set("audio_mime_type", None)
|
86 |
+
print("Audio buffer reset")
|
87 |
+
|
88 |
async def process_message(content: str, answer_message=None, mime_type=None):
|
89 |
agent = cl.user_session.get("agent")
|
90 |
cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
|
|
|
98 |
|
99 |
await step.stream_token(final_answer)
|
100 |
|
101 |
+
if answer_message:
|
102 |
+
answer_message.content = final_answer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
await answer_message.update()
|
|
|
|
smartquery/init_db.py
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import sqlite3
|
2 |
+
|
3 |
+
def initialize_database():
|
4 |
+
conn = sqlite3.connect('database/Chinook.db')
|
5 |
+
cursor = conn.cursor()
|
6 |
+
with open('database/Chinook_Sqlite.sql', 'r') as f:
|
7 |
+
sql_script = f.read()
|
8 |
+
cursor.executescript(sql_script)
|
9 |
+
conn.commit()
|
10 |
+
conn.close()
|
11 |
+
|
12 |
+
if __name__ == "__main__":
|
13 |
+
initialize_database()
|
prompt_templates.py β smartquery/prompt_templates.py
RENAMED
File without changes
|
sql_agent.py β smartquery/sql_agent.py
RENAMED
@@ -9,14 +9,14 @@ from langchain_core.prompts import ChatPromptTemplate, FewShotPromptTemplate, Me
|
|
9 |
from langchain_openai import OpenAIEmbeddings
|
10 |
from langchain_openai import ChatOpenAI
|
11 |
from langchain_community.utilities import SQLDatabase
|
12 |
-
from prompt_templates import few_shot_examples, system_prefix
|
13 |
|
14 |
|
15 |
# Load the .env file
|
16 |
load_dotenv()
|
17 |
|
18 |
# Initialize the SQL database
|
19 |
-
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
|
20 |
|
21 |
# Check the database connection
|
22 |
print(db.dialect)
|
|
|
9 |
from langchain_openai import OpenAIEmbeddings
|
10 |
from langchain_openai import ChatOpenAI
|
11 |
from langchain_community.utilities import SQLDatabase
|
12 |
+
from smartquery.prompt_templates import few_shot_examples, system_prefix
|
13 |
|
14 |
|
15 |
# Load the .env file
|
16 |
load_dotenv()
|
17 |
|
18 |
# Initialize the SQL database
|
19 |
+
db = SQLDatabase.from_uri("sqlite:///database/Chinook.db")
|
20 |
|
21 |
# Check the database connection
|
22 |
print(db.dialect)
|