JulsdL commited on
Commit
adf8836
Β·
1 Parent(s): 395355e

Add initial database setup and comprehensive documentation for SmartQuery application; include Dockerfile for containerized deployment

Browse files
Dockerfile ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use the official Python image from the Docker Hub
2
+ FROM python:3.11
3
+
4
+ # Create a new user with a specific UID and set it as the default user
5
+ RUN useradd -m -u 1000 user
6
+
7
+ # Switch to the new user
8
+ USER user
9
+
10
+ # Set environment variables
11
+ ENV HOME=/home/user \
12
+ PATH=/home/user/.local/bin:$PATH
13
+
14
+ # Set the working directory
15
+ WORKDIR $HOME/app
16
+
17
+ # Copy the requirements file and install the dependencies
18
+ COPY --chown=user ./requirements.txt $HOME/app/requirements.txt
19
+ RUN pip install --upgrade pip && \
20
+ pip install -r requirements.txt
21
+
22
+ # Copy project files to the working directory
23
+ COPY --chown=user . $HOME/app
24
+
25
+ # Set PYTHONPATH to include the project directory
26
+ ENV PYTHONPATH=$HOME/app
27
+
28
+ # Initialize the database
29
+ RUN python smartquery/init_db.py
30
+
31
+ # Set the command to run the application
32
+ CMD ["chainlit", "run", "smartquery/app.py", "--port", "7860"]
README.md CHANGED
@@ -4,3 +4,83 @@ colorFrom: blue
4
  colorTo: yellow
5
  sdk: docker
6
  pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorTo: yellow
5
  sdk: docker
6
  pinned: false
7
+
8
+ # SmartQuery
9
+
10
+ SmartQuery is an intelligent assistant designed to provide seamless interaction with your database. Built on top of LangChain and Chainlit, and using the OpenAI API, SmartQuery allows users to query their database using natural language, either through text or voice commands.
11
+
12
+ ## Features
13
+
14
+ - **Natural Language Querying:** Interact with your database using plain English, no SQL required.
15
+ - **Voice Commands:** Ask questions out loud and get verbal responses.
16
+ - **Rich Insights:** Get detailed answers and insights from your data.
17
+ - **User-Friendly Interface:** Simple chat-based interaction for ease of use.
18
+
19
+ ## Setup
20
+
21
+ Clone the repository:
22
+
23
+ ```bash
24
+ git clone https://github.com/your-repo/SmartQuery.git
25
+ cd SmartQuery
26
+ ```
27
+
28
+ Install the required dependencies using pip:
29
+
30
+ ```bash
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ Create and populate the database:
35
+
36
+ ```bash
37
+ sqlite3 database/Chinook.db
38
+ .read database/Chinook_Sqlite.sql
39
+ .exit
40
+ ```
41
+
42
+ Create a .env file and add your environment variables:
43
+
44
+ ```bash
45
+ OPENAI_API_KEY=your-openai-key-here
46
+ ELEVENLABS_API_KEY=your-elevenlabs-key-here
47
+ ELEVENLABS_VOICE_ID=your-elevenlabs-voice-id-here
48
+ ```
49
+
50
+ Run the application using the following command:
51
+
52
+ ```bash
53
+ chainlit run app.py
54
+ ```
55
+
56
+ ## Usage
57
+
58
+ Start a chat session and ask questions related to the content of the Chinook database. The application supports both text and voice input. For voice input, press the microphone button, speak your question, and let SmartQuery process your query. It might take some time to answer complex questions (usually less than 1 minute), so please be patient.
59
+
60
+ ### Sample Questions to Try:
61
+
62
+ - "What is the most expensive track?"
63
+ - "List the total sales per country. Which country's customers spent the most?"
64
+ - "Who are the 3 most listened artists and what is their average revenue?"
65
+
66
+ ## About the Database
67
+
68
+ The database powering SmartQuery is the Chinook Database, a sample database representing a digital media store. It contains tables for:
69
+
70
+ - **Artists**: Information about music artists.
71
+ - **Albums**: Details of albums released by artists.
72
+ - **Tracks**: Data on individual tracks, including their length and price.
73
+ - **Genres**: Different genres of music.
74
+ - **Customers**: Information about the customers of the store.
75
+ - **Invoices**: Purchase records containing information on sales transactions.
76
+ - **InvoiceLines**: Details about each item in an invoice.
77
+ - **Employees**: Data on employees managing the store.
78
+ - **Playlists**: User-generated playlists.
79
+ - **PlaylistTracks**: Mapping of tracks to playlists.
80
+ - **MediaTypes**: Types of media the tracks are available in.
81
+
82
+ This structure allows you to ask a wide range of questions about sales, customer preferences, artist performance, and more. Feel free to explore the richness of the data and uncover valuable insights.
83
+
84
+ ## Acknowledgements
85
+
86
+ This project uses technologies including LangChain, OpenAI's GPT models, FAISS for vector storage, Eleven Labs for speech-to-text and text-to-speech, and ChainLit for building interactive AI applications. Thanks to all open-source contributors and organizations that make these tools available.
chainlit.md CHANGED
@@ -4,12 +4,44 @@ Hello and welcome to SmartQuery, your intelligent assistant designed to help you
4
 
5
  ## Getting Started
6
 
7
- ## Sample Questions to Try:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## How It Works 🧠
10
 
 
 
11
  ## The Tech Behind It πŸ’‘πŸ€–
12
 
 
 
 
 
 
 
 
 
13
  ## Ready to Query?
14
 
15
  With SmartQuery, transform your database interactions into a seamless experience. Dive into your data, uncover insights, and make data-driven decisions more effectively than ever before. Happy querying!
 
4
 
5
  ## Getting Started
6
 
7
+ ### Sample Questions to Try:
8
+
9
+ - "What is the most expensive track?"
10
+ - "List the total sales per country. Which country's customers spent the most?"
11
+ - "Who are the 3 most listened artists and what is their average revenue?"
12
+
13
+ ## About the Database πŸŽΆπŸ’½
14
+
15
+ The database powering SmartQuery is the Chinook Database, a sample database representing a digital media store. It contains tables for:
16
+
17
+ - **Artists**: Information about music artists.
18
+ - **Albums**: Details of albums released by artists.
19
+ - **Tracks**: Data on individual tracks, including their length and price.
20
+ - **Genres**: Different genres of music.
21
+ - **Customers**: Information about the customers of the store.
22
+ - **Invoices**: Purchase records containing information on sales transactions.
23
+ - **InvoiceLines**: Details about each item in an invoice.
24
+ - **Employees**: Data on employees managing the store.
25
+ - **Playlists**: User-generated playlists.
26
+ - **PlaylistTracks**: Mapping of tracks to playlists.
27
+ - **MediaTypes**: Types of media the tracks are available in.
28
+
29
+ This structure allows you to ask a wide range of questions about sales, customer preferences, artist performance, and more. Feel free to explore the richness of the data and uncover valuable insights.
30
 
31
  ## How It Works 🧠
32
 
33
+ SmartQuery is an agentic application that leverages advanced natural language processing (NLP) techniques to understand your queries in plain English and convert them into SQL queries to retrieve the desired information from the database. You can interact with SmartQuery via text or voice, making it incredibly versatile and user-friendly.
34
+
35
  ## The Tech Behind It πŸ’‘πŸ€–
36
 
37
+ SmartQuery is built using state-of-the-art technologies, including:
38
+
39
+ - **LangChain**: For chaining together multiple language model prompts and responses.
40
+ - **OpenAI GPT-4o**: For understanding and processing natural language queries.
41
+ - **FAISS**: For efficient similarity search and retrieval.
42
+ - **Chainlit**: For building interactive AI applications front-end.
43
+ - **Eleven Labs**: For speech-to-text and text-to-speech functionalities, enabling voice interactions.
44
+
45
  ## Ready to Query?
46
 
47
  With SmartQuery, transform your database interactions into a seamless experience. Dive into your data, uncover insights, and make data-driven decisions more effectively than ever before. Happy querying!
database/Chinook_Sqlite.sql ADDED
The diff for this file is too large to render. See raw diff
 
old_app.py DELETED
@@ -1,24 +0,0 @@
1
- import chainlit as cl
2
- from langchain.schema.runnable.config import RunnableConfig
3
- from sql_agent import SQLAgent
4
-
5
- # ChainLit Integration
6
- @cl.on_chat_start
7
- async def on_chat_start():
8
- cl.user_session.set("agent", SQLAgent)
9
-
10
- @cl.on_message
11
- async def on_message(message: cl.Message):
12
- agent = cl.user_session.get("agent") # Get the agent from the session
13
- cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
14
- config = RunnableConfig(callbacks=[cb])
15
-
16
- async with cl.Step(name="SmartQuery Agent", root=True) as step:
17
- step.input = message.content
18
- result = await agent.ainvoke(message.content, config=config)
19
-
20
- # Assuming the result is a dictionary with a key 'output' containing the final answer
21
- final_answer = result.get('output', 'No answer returned')
22
-
23
- # Stream the final answer as a token to the step
24
- await step.stream_token(final_answer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
smartquery/__init__.py ADDED
File without changes
app.py β†’ smartquery/app.py RENAMED
@@ -4,7 +4,7 @@ import chainlit as cl
4
  import httpx
5
  from dotenv import load_dotenv
6
  from langchain.schema.runnable.config import RunnableConfig
7
- from sql_agent import SQLAgent
8
  from openai import AsyncOpenAI
9
  from chainlit.element import Audio
10
 
@@ -29,47 +29,13 @@ async def speech_to_text(audio_file):
29
 
30
  @cl.step(type="tool")
31
  async def generate_text_answer(transcription, images):
32
- model = "gpt-4-turbo"
33
  messages = [{"role": "user", "content": transcription}]
34
  response = await client.chat.completions.create(
35
  messages=messages, model=model, temperature=0.3
36
  )
37
  return response.choices[0].message.content
38
 
39
- @cl.step(type="tool")
40
- async def text_to_speech(text: str, mime_type: str):
41
- CHUNK_SIZE = 1024
42
- url = f"https://api.elevenlabs.io/v1/text-to-speech/{ELEVENLABS_VOICE_ID}"
43
-
44
- headers = {
45
- "Accept": mime_type,
46
- "Content-Type": "application/json",
47
- "xi-api-key": ELEVENLABS_API_KEY
48
- }
49
-
50
- data = {
51
- "text": text,
52
- "model_id": "eleven_monolingual_v1",
53
- "voice_settings": {
54
- "stability": 0.5,
55
- "similarity_boost": 0.5
56
- }
57
- }
58
-
59
- async with httpx.AsyncClient(timeout=25.0) as client:
60
- response = await client.post(url, json=data, headers=headers)
61
- response.raise_for_status() # Ensure we notice bad responses
62
-
63
- buffer = BytesIO()
64
- buffer.name = f"output_audio.{mime_type.split('/')[1]}"
65
-
66
- async for chunk in response.aiter_bytes(chunk_size=CHUNK_SIZE):
67
- if chunk:
68
- buffer.write(chunk)
69
-
70
- buffer.seek(0)
71
- return buffer.name, buffer.read()
72
-
73
  @cl.on_chat_start
74
  async def on_chat_start():
75
  cl.user_session.set("agent", SQLAgent)
@@ -114,6 +80,11 @@ async def on_audio_end(elements: list[Audio]):
114
 
115
  await process_message(transcription, answer_message, audio_mime_type)
116
 
 
 
 
 
 
117
  async def process_message(content: str, answer_message=None, mime_type=None):
118
  agent = cl.user_session.get("agent")
119
  cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
@@ -127,15 +98,6 @@ async def process_message(content: str, answer_message=None, mime_type=None):
127
 
128
  await step.stream_token(final_answer)
129
 
130
- if mime_type:
131
- output_name, output_audio = await text_to_speech(final_answer, mime_type)
132
- output_audio_el = Audio(
133
- name=output_name,
134
- auto_play=True,
135
- mime=mime_type,
136
- content=output_audio,
137
- )
138
- answer_message.elements = [output_audio_el]
139
  await answer_message.update()
140
- else:
141
- await cl.Message(content=final_answer).send()
 
4
  import httpx
5
  from dotenv import load_dotenv
6
  from langchain.schema.runnable.config import RunnableConfig
7
+ from smartquery.sql_agent import SQLAgent
8
  from openai import AsyncOpenAI
9
  from chainlit.element import Audio
10
 
 
29
 
30
  @cl.step(type="tool")
31
  async def generate_text_answer(transcription, images):
32
+ model = "gpt-4o"
33
  messages = [{"role": "user", "content": transcription}]
34
  response = await client.chat.completions.create(
35
  messages=messages, model=model, temperature=0.3
36
  )
37
  return response.choices[0].message.content
38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  @cl.on_chat_start
40
  async def on_chat_start():
41
  cl.user_session.set("agent", SQLAgent)
 
80
 
81
  await process_message(transcription, answer_message, audio_mime_type)
82
 
83
+ # Reset audio buffer and mime type
84
+ cl.user_session.set("audio_buffer", None)
85
+ cl.user_session.set("audio_mime_type", None)
86
+ print("Audio buffer reset")
87
+
88
  async def process_message(content: str, answer_message=None, mime_type=None):
89
  agent = cl.user_session.get("agent")
90
  cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True)
 
98
 
99
  await step.stream_token(final_answer)
100
 
101
+ if answer_message:
102
+ answer_message.content = final_answer
 
 
 
 
 
 
 
103
  await answer_message.update()
 
 
smartquery/init_db.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+
3
+ def initialize_database():
4
+ conn = sqlite3.connect('database/Chinook.db')
5
+ cursor = conn.cursor()
6
+ with open('database/Chinook_Sqlite.sql', 'r') as f:
7
+ sql_script = f.read()
8
+ cursor.executescript(sql_script)
9
+ conn.commit()
10
+ conn.close()
11
+
12
+ if __name__ == "__main__":
13
+ initialize_database()
prompt_templates.py β†’ smartquery/prompt_templates.py RENAMED
File without changes
sql_agent.py β†’ smartquery/sql_agent.py RENAMED
@@ -9,14 +9,14 @@ from langchain_core.prompts import ChatPromptTemplate, FewShotPromptTemplate, Me
9
  from langchain_openai import OpenAIEmbeddings
10
  from langchain_openai import ChatOpenAI
11
  from langchain_community.utilities import SQLDatabase
12
- from prompt_templates import few_shot_examples, system_prefix
13
 
14
 
15
  # Load the .env file
16
  load_dotenv()
17
 
18
  # Initialize the SQL database
19
- db = SQLDatabase.from_uri("sqlite:///Chinook.db")
20
 
21
  # Check the database connection
22
  print(db.dialect)
 
9
  from langchain_openai import OpenAIEmbeddings
10
  from langchain_openai import ChatOpenAI
11
  from langchain_community.utilities import SQLDatabase
12
+ from smartquery.prompt_templates import few_shot_examples, system_prefix
13
 
14
 
15
  # Load the .env file
16
  load_dotenv()
17
 
18
  # Initialize the SQL database
19
+ db = SQLDatabase.from_uri("sqlite:///database/Chinook.db")
20
 
21
  # Check the database connection
22
  print(db.dialect)