Spaces:
Sleeping
Sleeping
Implement Flashcard creation tool and update project naming
Browse files- Added FlashcardTool in notebook_tutor/tools.py for generating flashcards in CSV format, suitable for Anki import.
- Introduced FlashcardInput model to define the structure of flashcards input.
- Created TutorState in notebook_tutor/states.py to manage application states including flashcards creation status.
- Updated README.md to reflect the project's new name, AI-Notebook-Tutor, and updated instructions for running the application.
- Removed obsolete main.py
- README.md +3 -3
- flashcards/flashcards_03fb423e-2087-4598-9ebb-b99a20db0b93.csv +11 -0
- flashcards/flashcards_30a9e627-3d47-45fc-81db-a991269edb24.csv +11 -0
- flashcards/flashcards_37136597-b97c-4889-9dc1-c3b5e10084c1.csv +11 -0
- flashcards/flashcards_4dbaa0d1-6363-4584-9691-824540220571.csv +11 -0
- flashcards/flashcards_52137f3d-8a8d-4891-9216-ab3bbc6ee66a.csv +11 -0
- flashcards/flashcards_545f9f54-f8b0-4549-b27d-66b7303a017e.csv +11 -0
- flashcards/flashcards_af3650ed-34b5-4040-b672-036e1cf3b8e3.csv +11 -0
- main.py +0 -118
- {aims_tutor β notebook_tutor}/__init__.py +0 -0
- {aims_tutor β notebook_tutor}/app.py +1 -1
- {aims_tutor β notebook_tutor}/chainlit_frontend.py +53 -14
- {aims_tutor β notebook_tutor}/document_processing.py +1 -1
- {aims_tutor β notebook_tutor}/graph.py +56 -37
- {aims_tutor β notebook_tutor}/prompt_templates.py +0 -0
- {aims_tutor β notebook_tutor}/retrieval.py +0 -0
- notebook_tutor/states.py +12 -0
- notebook_tutor/tools.py +43 -0
- {aims_tutor β notebook_tutor}/utils.py +0 -0
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
-
#
|
2 |
|
3 |
# RAG Application for QA in Jupyter Notebook
|
4 |
|
5 |
-
|
6 |
|
7 |
## Features
|
8 |
|
@@ -28,7 +28,7 @@ OPENAI_API_KEY=your-key-here
|
|
28 |
4. Run the application using the following command:
|
29 |
|
30 |
```bash
|
31 |
-
chainlit run
|
32 |
```
|
33 |
|
34 |
## Usage
|
|
|
1 |
+
# AI-Notebook-Tutor
|
2 |
|
3 |
# RAG Application for QA in Jupyter Notebook
|
4 |
|
5 |
+
AI-Notebook-Tutor is designed to provide question-answering capabilities in a Jupyter Notebook using the Retrieval Augmented Generation (RAG) model. It's built on top of the LangChain and Chainlit platforms, and it uses the OpenAI API for the chat model.
|
6 |
|
7 |
## Features
|
8 |
|
|
|
28 |
4. Run the application using the following command:
|
29 |
|
30 |
```bash
|
31 |
+
chainlit run notebook_tutor/app.py
|
32 |
```
|
33 |
|
34 |
## Usage
|
flashcards/flashcards_03fb423e-2087-4598-9ebb-b99a20db0b93.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What is the first step in the retriever creation process?,The first step is loading a fine-tuned embedding model from the hub using the HuggingFaceEmbeddings class. The model used is named 'JulsdL/e2erag-arctic-m' and it is configured to run on a CUDA device.
|
3 |
+
Which class is used to load the embedding model in the retriever creation process?,The HuggingFaceEmbeddings class is used to load the embedding model.
|
4 |
+
What is the name of the embedding model used in the retriever creation process?,The embedding model used is named 'JulsdL/e2erag-arctic-m'.
|
5 |
+
On which device is the embedding model 'JulsdL/e2erag-arctic-m' configured to run?,The embedding model is configured to run on a CUDA device.
|
6 |
+
What is the purpose of setting up a VectorStore in the retriever creation process?,The VectorStore is set up to power the dense vector search and is populated with documents that have been split into chunks and embedded using the embedding model.
|
7 |
+
Which tool is used to set up the VectorStore in the retriever creation process?,Meta's FAISS (Facebook AI Similarity Search) is used to set up the VectorStore.
|
8 |
+
How are documents prepared for the VectorStore in the retriever creation process?,Documents are split into chunks and embedded using the previously loaded embedding model before being added to the VectorStore.
|
9 |
+
What is the final step in the retriever creation process?,The final step is converting the VectorStore into a retriever that can fetch relevant documents or chunks based on query embeddings.
|
10 |
+
"In the context of retriever creation, what is the purpose of converting the VectorStore to a retriever?",Converting the VectorStore to a retriever allows it to be used for fetching relevant documents or chunks based on the query embeddings.
|
11 |
+
How does a retriever enhance the model's ability in a Retrieval-Augmented Generation (RAG) setup?,"A retriever enhances the model's ability by providing contextually relevant responses based on the input queries, improving the quality and relevance of the generated content."
|
flashcards/flashcards_30a9e627-3d47-45fc-81db-a991269edb24.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What command is used to clone a repository from GitHub?,!git clone https://github.com/arcee-ai/DALM
|
3 |
+
How do you install a package using pip and upgrade it if necessary?,!pip install --upgrade -q -e .
|
4 |
+
"Which command is used to install the latest versions of langchain, langchain-core, langchain-community, and sentence_transformers?",!pip install -qU langchain langchain-core langchain-community sentence_transformers
|
5 |
+
How do you install the pymupdf and faiss-cpu libraries using pip?,!pip install -qU pymupdf faiss-cpu
|
6 |
+
What is the import statement for pandas?,import pandas as pd
|
7 |
+
How do you import HuggingFaceEmbeddings from the langchain_community library?,from langchain_community.embeddings import HuggingFaceEmbeddings
|
8 |
+
What is the import statement for FAISS from the langchain_community library?,from langchain_community.vectorstores import FAISS
|
9 |
+
Which import statement is used for SimpleDirectoryReader from llama_index.core?,from llama_index.core import SimpleDirectoryReader
|
10 |
+
How do you import SimpleNodeParser from the llama_index.core.node_parser module?,from llama_index.core.node_parser import SimpleNodeParser
|
11 |
+
What is the import statement for MetadataMode from llama_index.core.schema?,from llama_index.core.schema import MetadataMode
|
flashcards/flashcards_37136597-b97c-4889-9dc1-c3b5e10084c1.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What is the first step in the retriever creation process?,The first step in the retriever creation process is loading the embedding model.
|
3 |
+
Which model is used as the embedding model in the retriever creation process?,The embedding model used is 'JulsdL/e2erag-arctic-m'.
|
4 |
+
Which class and module are used to load the embedding model?,The embedding model is loaded using the HuggingFaceEmbeddings class from the langchain_community.embeddings module.
|
5 |
+
On which device is the embedding model configured to run?,The embedding model is configured to run on a CUDA device.
|
6 |
+
What is the purpose of the VectorStore in the retriever creation process?,The VectorStore is created to power dense vector searches.
|
7 |
+
Which technology is used to set up the VectorStore?,Meta's FAISS (Facebook AI Similarity Search) is used to set up the VectorStore.
|
8 |
+
How are the documents prepared for creating the VectorStore?,The documents are split into chunks before creating the VectorStore.
|
9 |
+
Which method is called to create the VectorStore from the documents and embedding model?,The method FAISS.from_documents is called to create the VectorStore.
|
10 |
+
How is the VectorStore converted into a retriever?,The VectorStore is converted into a retriever by invoking the as_retriever() method on the vector_store object.
|
11 |
+
What are the benefits of the retriever created through this process?,"The retriever leverages the power of dense embeddings and efficient search capabilities provided by FAISS, making it effective for retrieval-augmented tasks."
|
flashcards/flashcards_4dbaa0d1-6363-4584-9691-824540220571.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What is the first step in creating a retriever for a document?,The first step is loading the documents using a loader like `PyMuPDFLoader` to load documents from a PDF file.
|
3 |
+
Which function is used to load documents from a PDF file?,The function `PyMuPDFLoader` is used to load documents from a PDF file.
|
4 |
+
How are documents split into chunks for processing?,"Documents are split into chunks using the `RecursiveCharacterTextSplitter`, which divides the documents based on predefined rules like token limits and split characters."
|
5 |
+
What is the purpose of chunking documents?,Chunking documents into smaller parts helps in managing and processing them more efficiently for embedding and retrieval.
|
6 |
+
Which library is used for loading a pre-trained embedding model?,The `HuggingFaceEmbeddings` library is used for loading a pre-trained embedding model.
|
7 |
+
How do you specify the device for running the embedding model?,"You specify the device (e.g., 'cuda') in the `model_kwargs` parameter when loading the embedding model."
|
8 |
+
What is Meta's FAISS used for in the retriever creation process?,Meta's FAISS is used to power dense vector search by creating a vector store from the document chunks and embedding model.
|
9 |
+
How do you convert a vector store into a retriever?,You convert a vector store into a retriever using the `as_retriever` method.
|
10 |
+
What is the role of the retriever in the retrieval-augmented generation (RAG) system?,"The retriever fetches relevant document chunks based on query vectors, providing contextually relevant responses."
|
11 |
+
Which embedding model is used in the example provided for retriever creation?,The embedding model 'JulsdL/e2erag-arctic-m' from Hugging Face is used in the example.
|
flashcards/flashcards_52137f3d-8a8d-4891-9216-ab3bbc6ee66a.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What library is used to load the fine-tuned embedding model for retriever creation?,The HuggingFaceEmbeddings class is used to load the fine-tuned embedding model.
|
3 |
+
What model is loaded for embedding in the retriever creation process?,The model 'JulsdL/e2erag-arctic-m' is loaded for embedding.
|
4 |
+
On which device is the embedding model set to run?,The embedding model is set to run on a CUDA device.
|
5 |
+
Which class is used to set up the vector store in retriever creation?,The FAISS class from Meta is used to set up the vector store.
|
6 |
+
What is the purpose of the vector store in the retriever creation process?,The vector store powers the dense vector search by storing documents that have been split into chunks and embedded.
|
7 |
+
How is the vector store created in the retriever creation process?,The vector store is created from documents that have been split into chunks and embedded using the loaded embedding model.
|
8 |
+
How is the vector store converted into a retriever?,The vector store is converted into a retriever using the 'as_retriever()' method.
|
9 |
+
What is a retriever used for in the context of retriever creation?,A retriever is used to retrieve context based on a query for a language model.
|
10 |
+
What is the benefit of creating a retriever in a RAG setup?,Creating a retriever enhances the capabilities of language models by providing them with relevant context for generating responses.
|
11 |
+
What does 'RAG' stand for in the context of retriever creation?,'RAG' stands for Retrieval-Augmented Generation.
|
flashcards/flashcards_545f9f54-f8b0-4549-b27d-66b7303a017e.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What tool is used to load documents from a PDF file in the retriever creation process?,The PyMuPDFLoader is used to load documents from a PDF file.
|
3 |
+
What is the purpose of the RecursiveCharacterTextSplitter in the retriever creation process?,The RecursiveCharacterTextSplitter is used to divide the documents into manageable chunks based on specific rules such as token limits and preferred split characters.
|
4 |
+
Which library is used to create a dense vector store in the retriever creation process?,Meta's FAISS library is used to create a dense vector store from the document chunks and the embedding model.
|
5 |
+
How do you load a pre-trained embedding model from Hugging Face to run on a CUDA device?,Use the HuggingFaceEmbeddings class with the model name and specify the device as CUDA in the model_kwargs parameter.
|
6 |
+
What is the role of the embedding model in the retriever creation process?,The embedding model is used to convert text into vectors that can be stored in the vector store for efficient retrieval.
|
7 |
+
How do you convert a vector store into a retriever?,Use the as_retriever method on the vector store to convert it into a retriever.
|
8 |
+
What is the first step in the retriever creation process?,The first step is loading the documents using PyMuPDFLoader.
|
9 |
+
Which model name is used in the provided example for the embedding model?,"The model name used is ""JulsdL/e2erag-arctic-m""."
|
10 |
+
What is the purpose of the vector store in the retriever creation process?,"The vector store holds the dense vector representations of document chunks, allowing for efficient retrieval based on query embeddings."
|
11 |
+
Which Python import is necessary to use FAISS for creating a vector store?,You need to import FAISS from langchain_community.vectorstores.
|
flashcards/flashcards_af3650ed-34b5-4040-b672-036e1cf3b8e3.csv
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Front,Back
|
2 |
+
What is the first step in creating a retriever?,The first step in creating a retriever is loading a fine-tuned embedding model from the hub using the HuggingFaceEmbeddings class.
|
3 |
+
Which class is used to load the embedding model?,The HuggingFaceEmbeddings class is used to load the embedding model.
|
4 |
+
How is the model 'JulsdL/e2erag-arctic-m' loaded onto the GPU?,"The model 'JulsdL/e2erag-arctic-m' is loaded onto the GPU by setting the model_kwargs parameter to {""device"": ""cuda""} in the HuggingFaceEmbeddings class."
|
5 |
+
What library is used to set up the vector store?,Meta's FAISS library is used to set up the vector store.
|
6 |
+
What does the vector store manage?,The vector store manages the dense vectors generated by the embedding model.
|
7 |
+
How is the vector store converted into a retriever?,The vector store is converted into a retriever using the as_retriever() method.
|
8 |
+
What is the purpose of the retriever in this context?,The purpose of the retriever is to efficiently fetch relevant document vectors based on the query vectors.
|
9 |
+
What is the role of the embedding model in the retriever creation process?,"The embedding model generates dense vectors for the documents, which are then managed by the vector store and used by the retriever to find relevant information."
|
10 |
+
Which parameter in the HuggingFaceEmbeddings class specifies the device to be used?,"The model_kwargs parameter specifies the device to be used, such as ""cuda"" for GPU."
|
11 |
+
What does the FAISS library stand for?,FAISS stands for Facebook AI Similarity Search.
|
main.py
DELETED
@@ -1,118 +0,0 @@
|
|
1 |
-
import os
|
2 |
-
from operator import itemgetter
|
3 |
-
|
4 |
-
import chainlit as cl
|
5 |
-
import tiktoken
|
6 |
-
from dotenv import load_dotenv
|
7 |
-
|
8 |
-
|
9 |
-
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
10 |
-
from langchain.retrievers import MultiQueryRetriever
|
11 |
-
from langchain_core.prompts import ChatPromptTemplate
|
12 |
-
from langchain_core.runnables import RunnablePassthrough
|
13 |
-
from langchain_community.document_loaders import PyMuPDFLoader, PythonLoader, NotebookLoader
|
14 |
-
from langchain_community.vectorstores import Qdrant
|
15 |
-
from langchain_openai import ChatOpenAI
|
16 |
-
from langchain_openai.embeddings import OpenAIEmbeddings
|
17 |
-
|
18 |
-
# Load environment variables
|
19 |
-
load_dotenv()
|
20 |
-
|
21 |
-
# Configuration for OpenAI
|
22 |
-
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
|
23 |
-
openai_chat_model = ChatOpenAI(model="gpt-4-turbo", temperature=0.1)
|
24 |
-
|
25 |
-
# Define the RAG prompt
|
26 |
-
RAG_PROMPT = """
|
27 |
-
CONTEXT:
|
28 |
-
{context}
|
29 |
-
|
30 |
-
QUERY:
|
31 |
-
{question}
|
32 |
-
|
33 |
-
Answer the query in a pretty format if the context is related to it; otherwise, answer: 'Sorry, I can't answer.'
|
34 |
-
"""
|
35 |
-
rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
|
36 |
-
|
37 |
-
|
38 |
-
# ChainLit setup for chat interaction
|
39 |
-
@cl.on_chat_start
|
40 |
-
async def start_chat():
|
41 |
-
settings = {
|
42 |
-
"model": "gpt-3.5-turbo",
|
43 |
-
"temperature": 0,
|
44 |
-
"top_p": 1,
|
45 |
-
"frequency_penalty": 0,
|
46 |
-
"presence_penalty": 0,
|
47 |
-
}
|
48 |
-
cl.user_session.set("settings", settings)
|
49 |
-
|
50 |
-
# Display a welcoming message with instructions
|
51 |
-
welcome_message = "Welcome to the AIMS-Tutor! Please upload a Jupyter notebook (.ipynb and max. 5mb) to start."
|
52 |
-
await cl.Message(content=welcome_message).send()
|
53 |
-
|
54 |
-
# Wait for the user to upload a file
|
55 |
-
files = None
|
56 |
-
while files is None:
|
57 |
-
files = await cl.AskFileMessage(
|
58 |
-
content="Please upload a Jupyter notebook (.ipynb, max. 5mb):",
|
59 |
-
accept={"application/x-ipynb+json": [".ipynb"]},
|
60 |
-
max_size_mb=5
|
61 |
-
).send()
|
62 |
-
|
63 |
-
file = files[0] # Get the first file
|
64 |
-
|
65 |
-
if file:
|
66 |
-
# Load the Jupyter notebook
|
67 |
-
notebook_path = file.path # Extract the path from the AskFileResponse object
|
68 |
-
|
69 |
-
loader = NotebookLoader(
|
70 |
-
notebook_path,
|
71 |
-
include_outputs=False,
|
72 |
-
max_output_length=20,
|
73 |
-
remove_newline=True,
|
74 |
-
traceback=False
|
75 |
-
)
|
76 |
-
docs = loader.load()
|
77 |
-
cl.user_session.set("docs", docs) # Store the docs in the user session
|
78 |
-
|
79 |
-
# Initialize the retriever components after loading document
|
80 |
-
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50, length_function=tiktoken_len) # Initialize the text splitter
|
81 |
-
split_chunks = text_splitter.split_documents(docs) # Split the documents into chunks
|
82 |
-
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small") # Initialize the embedding model
|
83 |
-
qdrant_vectorstore = Qdrant.from_documents(split_chunks, embedding_model, location=":memory:", collection_name="Notebook") # Create a Qdrant vector store
|
84 |
-
qdrant_retriever = qdrant_vectorstore.as_retriever() # Set the Qdrant vector store as a retriever
|
85 |
-
multiquery_retriever = MultiQueryRetriever.from_llm(retriever=qdrant_retriever, llm=openai_chat_model, include_original=True) # Create a multi-query retriever on top of the Qdrant retriever
|
86 |
-
|
87 |
-
# Store the multiquery_retriever in the user session
|
88 |
-
cl.user_session.set("multiquery_retriever", multiquery_retriever)
|
89 |
-
|
90 |
-
|
91 |
-
@cl.on_message
|
92 |
-
async def main(message: cl.Message):
|
93 |
-
# Retrieve the multi-query retriever from session
|
94 |
-
multiquery_retriever = cl.user_session.get("multiquery_retriever")
|
95 |
-
if not multiquery_retriever:
|
96 |
-
await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
|
97 |
-
return
|
98 |
-
|
99 |
-
question = message.content
|
100 |
-
response = handle_query(question, multiquery_retriever) # Process the question
|
101 |
-
|
102 |
-
msg = cl.Message(content=response)
|
103 |
-
await msg.send()
|
104 |
-
|
105 |
-
def handle_query(question, retriever):
|
106 |
-
# Define the retrieval augmented query-answering chain
|
107 |
-
retrieval_augmented_qa_chain = (
|
108 |
-
{"context": itemgetter("question") | retriever, "question": itemgetter("question")}
|
109 |
-
| RunnablePassthrough.assign(context=itemgetter("context"))
|
110 |
-
| {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
|
111 |
-
)
|
112 |
-
response = retrieval_augmented_qa_chain.invoke({"question": question})
|
113 |
-
return response["response"].content
|
114 |
-
|
115 |
-
# Tokenization function
|
116 |
-
def tiktoken_len(text):
|
117 |
-
tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
|
118 |
-
return len(tokens)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
{aims_tutor β notebook_tutor}/__init__.py
RENAMED
File without changes
|
{aims_tutor β notebook_tutor}/app.py
RENAMED
@@ -1,6 +1,6 @@
|
|
1 |
import os
|
2 |
from dotenv import load_dotenv
|
3 |
-
import
|
4 |
|
5 |
# Load environment variables
|
6 |
load_dotenv()
|
|
|
1 |
import os
|
2 |
from dotenv import load_dotenv
|
3 |
+
import notebook_tutor.chainlit_frontend as cl_frontend
|
4 |
|
5 |
# Load environment variables
|
6 |
load_dotenv()
|
{aims_tutor β notebook_tutor}/chainlit_frontend.py
RENAMED
@@ -1,13 +1,20 @@
|
|
|
|
|
|
1 |
import chainlit as cl
|
2 |
from dotenv import load_dotenv
|
3 |
from document_processing import DocumentManager
|
4 |
from retrieval import RetrievalManager
|
5 |
from langchain_core.messages import AIMessage, HumanMessage
|
6 |
-
from graph import
|
7 |
|
8 |
# Load environment variables
|
9 |
load_dotenv()
|
10 |
|
|
|
|
|
|
|
|
|
|
|
11 |
@cl.on_chat_start
|
12 |
async def start_chat():
|
13 |
settings = {
|
@@ -18,7 +25,7 @@ async def start_chat():
|
|
18 |
"presence_penalty": 0,
|
19 |
}
|
20 |
cl.user_session.set("settings", settings)
|
21 |
-
welcome_message = "Welcome to the
|
22 |
await cl.Message(content=welcome_message).send()
|
23 |
|
24 |
files = None
|
@@ -42,27 +49,36 @@ async def start_chat():
|
|
42 |
# Initialize LangGraph chain with the retrieval chain
|
43 |
retrieval_chain = cl.user_session.get("retrieval_manager").get_RAG_QA_chain()
|
44 |
cl.user_session.set("retrieval_chain", retrieval_chain)
|
45 |
-
|
46 |
-
cl.user_session.set("
|
|
|
|
|
47 |
|
48 |
@cl.on_message
|
49 |
async def main(message: cl.Message):
|
50 |
# Retrieve the LangGraph chain from the session
|
51 |
-
|
52 |
|
53 |
-
if not
|
54 |
await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
|
55 |
return
|
56 |
|
57 |
# Create the initial state with the user message
|
58 |
user_message = message.content
|
59 |
-
state =
|
60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
|
62 |
print(f"Initial state: {state}")
|
63 |
|
64 |
# Process the message through the LangGraph chain
|
65 |
-
for s in
|
66 |
print(f"State after processing: {s}")
|
67 |
|
68 |
# Extract messages from the state
|
@@ -75,15 +91,38 @@ async def main(message: cl.Message):
|
|
75 |
else:
|
76 |
print("Error: No messages found in agent state.")
|
77 |
else:
|
|
|
|
|
|
|
78 |
# Check if the quiz was created and send it to the frontend
|
79 |
-
if
|
80 |
-
quiz_message =
|
81 |
await cl.Message(content=quiz_message).send()
|
|
|
82 |
# Check if a question was answered and send the response to the frontend
|
83 |
-
if
|
84 |
-
qa_message =
|
85 |
await cl.Message(content=qa_message).send()
|
86 |
|
87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
break
|
|
|
1 |
+
import os
|
2 |
+
import logging
|
3 |
import chainlit as cl
|
4 |
from dotenv import load_dotenv
|
5 |
from document_processing import DocumentManager
|
6 |
from retrieval import RetrievalManager
|
7 |
from langchain_core.messages import AIMessage, HumanMessage
|
8 |
+
from graph import create_tutor_chain, TutorState
|
9 |
|
10 |
# Load environment variables
|
11 |
load_dotenv()
|
12 |
|
13 |
+
# Set up logging
|
14 |
+
logging.basicConfig(level=logging.INFO)
|
15 |
+
|
16 |
+
logger = logging.getLogger(__name__)
|
17 |
+
|
18 |
@cl.on_chat_start
|
19 |
async def start_chat():
|
20 |
settings = {
|
|
|
25 |
"presence_penalty": 0,
|
26 |
}
|
27 |
cl.user_session.set("settings", settings)
|
28 |
+
welcome_message = "Welcome to the Notebook-Tutor! Please upload a Jupyter notebook (.ipynb and max. 5mb) to start."
|
29 |
await cl.Message(content=welcome_message).send()
|
30 |
|
31 |
files = None
|
|
|
49 |
# Initialize LangGraph chain with the retrieval chain
|
50 |
retrieval_chain = cl.user_session.get("retrieval_manager").get_RAG_QA_chain()
|
51 |
cl.user_session.set("retrieval_chain", retrieval_chain)
|
52 |
+
tutor_chain = create_tutor_chain(retrieval_chain)
|
53 |
+
cl.user_session.set("tutor_chain", tutor_chain)
|
54 |
+
|
55 |
+
logger.info("Chat started and notebook uploaded successfully.")
|
56 |
|
57 |
@cl.on_message
|
58 |
async def main(message: cl.Message):
|
59 |
# Retrieve the LangGraph chain from the session
|
60 |
+
tutor_chain = cl.user_session.get("tutor_chain")
|
61 |
|
62 |
+
if not tutor_chain:
|
63 |
await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
|
64 |
return
|
65 |
|
66 |
# Create the initial state with the user message
|
67 |
user_message = message.content
|
68 |
+
state = TutorState(
|
69 |
+
messages=[HumanMessage(content=user_message)],
|
70 |
+
next="supervisor",
|
71 |
+
quiz=[],
|
72 |
+
quiz_created=False,
|
73 |
+
question_answered=False,
|
74 |
+
flashcards_created=False,
|
75 |
+
flashcard_filename="",
|
76 |
+
)
|
77 |
|
78 |
print(f"Initial state: {state}")
|
79 |
|
80 |
# Process the message through the LangGraph chain
|
81 |
+
for s in tutor_chain.stream(state, {"recursion_limit": 10}):
|
82 |
print(f"State after processing: {s}")
|
83 |
|
84 |
# Extract messages from the state
|
|
|
91 |
else:
|
92 |
print("Error: No messages found in agent state.")
|
93 |
else:
|
94 |
+
# Extract the final state
|
95 |
+
final_state = next(iter(s.values()))
|
96 |
+
|
97 |
# Check if the quiz was created and send it to the frontend
|
98 |
+
if final_state.get("quiz_created"):
|
99 |
+
quiz_message = final_state["messages"][-1].content
|
100 |
await cl.Message(content=quiz_message).send()
|
101 |
+
|
102 |
# Check if a question was answered and send the response to the frontend
|
103 |
+
if final_state.get("question_answered"):
|
104 |
+
qa_message = final_state["messages"][-1].content
|
105 |
await cl.Message(content=qa_message).send()
|
106 |
|
107 |
+
# Check if flashcards are ready and send the file to the frontend
|
108 |
+
if final_state.get("flashcards_created"):
|
109 |
+
flashcards_message = final_state["messages"][-1].content
|
110 |
+
await cl.Message(content=flashcards_message).send()
|
111 |
+
|
112 |
+
# Create a full path to the file
|
113 |
+
flashcard_filename = final_state["flashcard_filename"]
|
114 |
+
print(f"Flashcard filename: {flashcard_filename}")
|
115 |
+
flashcard_path = os.path.abspath(flashcard_filename)
|
116 |
+
print(f"Flashcard path: {flashcard_path}")
|
117 |
+
|
118 |
+
# Use the File class to send the file
|
119 |
+
file_element = cl.File(name=os.path.basename(flashcard_filename), path=flashcard_path)
|
120 |
+
print(f"Sending flashcards file: {file_element}")
|
121 |
+
await cl.Message(
|
122 |
+
content="Here are your flashcards:",
|
123 |
+
elements=[file_element]
|
124 |
+
).send()
|
125 |
+
|
126 |
+
print("Reached END state.")
|
127 |
|
128 |
break
|
{aims_tutor β notebook_tutor}/document_processing.py
RENAMED
@@ -6,7 +6,7 @@ from langchain.retrievers import MultiQueryRetriever
|
|
6 |
from langchain_openai.embeddings import OpenAIEmbeddings
|
7 |
from langchain_openai import ChatOpenAI
|
8 |
from dotenv import load_dotenv
|
9 |
-
from
|
10 |
|
11 |
# Load environment variables
|
12 |
load_dotenv()
|
|
|
6 |
from langchain_openai.embeddings import OpenAIEmbeddings
|
7 |
from langchain_openai import ChatOpenAI
|
8 |
from dotenv import load_dotenv
|
9 |
+
from notebook_tutor.utils import tiktoken_len
|
10 |
|
11 |
# Load environment variables
|
12 |
load_dotenv()
|
{aims_tutor β notebook_tutor}/graph.py
RENAMED
@@ -1,12 +1,14 @@
|
|
1 |
-
from typing import Annotated
|
2 |
from dotenv import load_dotenv
|
3 |
from langchain_core.tools import tool
|
4 |
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
|
5 |
-
from langchain_core.messages import AIMessage
|
6 |
from langchain.agents import AgentExecutor, create_openai_functions_agent
|
7 |
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
|
8 |
from langchain_openai import ChatOpenAI
|
9 |
from langgraph.graph import END, StateGraph
|
|
|
|
|
10 |
import functools
|
11 |
|
12 |
# Load environment variables
|
@@ -32,6 +34,9 @@ def get_retrieve_information_tool(retrieval_chain):
|
|
32 |
wrapper_instance = RetrievalChainWrapper(retrieval_chain)
|
33 |
return tool(wrapper_instance.retrieve_information)
|
34 |
|
|
|
|
|
|
|
35 |
# Function to create agents
|
36 |
def create_agent(
|
37 |
llm: ChatOpenAI,
|
@@ -60,20 +65,24 @@ def create_agent(
|
|
60 |
# Function to create agent nodes
|
61 |
def agent_node(state, agent, name):
|
62 |
result = agent.invoke(state)
|
63 |
-
if 'messages' not in result:
|
64 |
raise ValueError(f"No messages found in agent state: {result}")
|
65 |
new_state = {"messages": state["messages"] + [AIMessage(content=result["output"], name=name)]}
|
66 |
-
|
67 |
-
|
68 |
-
if name == "QuizAgent"
|
69 |
new_state["quiz_created"] = True
|
70 |
-
|
71 |
-
if name == "QAAgent":
|
72 |
new_state["question_answered"] = True
|
73 |
-
|
|
|
|
|
|
|
|
|
74 |
return new_state
|
75 |
|
76 |
|
|
|
77 |
# Function to create the supervisor
|
78 |
def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExecutor:
|
79 |
"""An LLM-based router."""
|
@@ -112,17 +121,9 @@ def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExec
|
|
112 |
| JsonOutputFunctionsParser()
|
113 |
)
|
114 |
|
115 |
-
# Define the state for the system
|
116 |
-
class AIMSState(TypedDict):
|
117 |
-
messages: List[BaseMessage]
|
118 |
-
next: str
|
119 |
-
quiz: List[dict]
|
120 |
-
quiz_created: bool
|
121 |
-
question_answered: bool
|
122 |
-
|
123 |
|
124 |
# Create the LangGraph chain
|
125 |
-
def
|
126 |
|
127 |
retrieve_information_tool = get_retrieve_information_tool(retrieval_chain)
|
128 |
|
@@ -139,37 +140,55 @@ def create_aims_chain(retrieval_chain):
|
|
139 |
quiz_agent = create_agent(
|
140 |
llm,
|
141 |
[retrieve_information_tool],
|
142 |
-
"You are a quiz creator that generates quizzes based on the provided notebook content.
|
143 |
-
|
144 |
-
"""First, You MUST Use the retrieval_inforation_tool to gather context from the notebook to gather relevant and accurate information.
|
145 |
-
|
146 |
Next, create a 5-question quiz based on the information you have gathered. Include the answers at the end of the quiz.
|
147 |
-
|
148 |
Present the quiz to the user in a clear and concise manner."""
|
149 |
)
|
150 |
|
151 |
quiz_node = functools.partial(agent_node, agent=quiz_agent, name="QuizAgent")
|
152 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
# Create Supervisor Agent
|
154 |
supervisor_agent = create_team_supervisor(
|
155 |
llm,
|
156 |
-
"You are a supervisor tasked with managing a conversation between the following agents: QAAgent, QuizAgent. Given the user request, decide which agent should act next.",
|
157 |
-
["QAAgent", "QuizAgent"],
|
158 |
)
|
159 |
|
160 |
# Build the LangGraph
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
|
|
|
|
169 |
"supervisor",
|
170 |
-
lambda x: "FINISH" if x.get("quiz_created")
|
171 |
-
{"QAAgent": "QAAgent",
|
|
|
|
|
|
|
172 |
)
|
173 |
|
174 |
-
|
175 |
-
return
|
|
|
1 |
+
from typing import Annotated
|
2 |
from dotenv import load_dotenv
|
3 |
from langchain_core.tools import tool
|
4 |
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
|
5 |
+
from langchain_core.messages import AIMessage
|
6 |
from langchain.agents import AgentExecutor, create_openai_functions_agent
|
7 |
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
|
8 |
from langchain_openai import ChatOpenAI
|
9 |
from langgraph.graph import END, StateGraph
|
10 |
+
from tools import create_flashcards_tool
|
11 |
+
from states import TutorState
|
12 |
import functools
|
13 |
|
14 |
# Load environment variables
|
|
|
34 |
wrapper_instance = RetrievalChainWrapper(retrieval_chain)
|
35 |
return tool(wrapper_instance.retrieve_information)
|
36 |
|
37 |
+
# Instantiate the tools
|
38 |
+
flashcard_tool = create_flashcards_tool
|
39 |
+
|
40 |
# Function to create agents
|
41 |
def create_agent(
|
42 |
llm: ChatOpenAI,
|
|
|
65 |
# Function to create agent nodes
|
66 |
def agent_node(state, agent, name):
|
67 |
result = agent.invoke(state)
|
68 |
+
if 'messages' not in result:
|
69 |
raise ValueError(f"No messages found in agent state: {result}")
|
70 |
new_state = {"messages": state["messages"] + [AIMessage(content=result["output"], name=name)]}
|
71 |
+
|
72 |
+
# Set the appropriate flags and next state
|
73 |
+
if name == "QuizAgent":
|
74 |
new_state["quiz_created"] = True
|
75 |
+
elif name == "QAAgent":
|
|
|
76 |
new_state["question_answered"] = True
|
77 |
+
elif name == "FlashcardsAgent":
|
78 |
+
new_state["flashcards_created"] = True
|
79 |
+
new_state["flashcard_filename"] = result["output"].split('(')[-1].strip(')')
|
80 |
+
|
81 |
+
new_state["next"] = "FINISH"
|
82 |
return new_state
|
83 |
|
84 |
|
85 |
+
|
86 |
# Function to create the supervisor
|
87 |
def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExecutor:
|
88 |
"""An LLM-based router."""
|
|
|
121 |
| JsonOutputFunctionsParser()
|
122 |
)
|
123 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
# Create the LangGraph chain
|
126 |
+
def create_tutor_chain(retrieval_chain):
|
127 |
|
128 |
retrieve_information_tool = get_retrieve_information_tool(retrieval_chain)
|
129 |
|
|
|
140 |
quiz_agent = create_agent(
|
141 |
llm,
|
142 |
[retrieve_information_tool],
|
143 |
+
"""You are a quiz creator that generates quizzes based on the provided notebook content.
|
144 |
+
First, You MUST Use the retrieval_inforation_tool to gather context from the notebook to gather relevant and accurate information.
|
|
|
|
|
145 |
Next, create a 5-question quiz based on the information you have gathered. Include the answers at the end of the quiz.
|
|
|
146 |
Present the quiz to the user in a clear and concise manner."""
|
147 |
)
|
148 |
|
149 |
quiz_node = functools.partial(agent_node, agent=quiz_agent, name="QuizAgent")
|
150 |
|
151 |
+
# Create Flashcards Agent
|
152 |
+
flashcards_agent = create_agent(
|
153 |
+
llm,
|
154 |
+
[retrieve_information_tool, flashcard_tool],
|
155 |
+
"""
|
156 |
+
You are the Flashcard creator. Your mission is to create effective and concise flashcards based on the user's query and the content of the provided notebook. Your role involves the following tasks:
|
157 |
+
1. Analyze User Query: Understand the user's request and determine the key concepts and information they need to learn.
|
158 |
+
2. Search Notebook Content: Use the notebook content to gather relevant information and generate accurate and informative flashcards.
|
159 |
+
3. Generate Flashcards: Create a series of flashcards content with clear questions on the front and detailed answers on the back. Ensure that the flashcards cover the essential points and concepts requested by the user.
|
160 |
+
4. Export Flashcards: Use the flashcard_tool to create and export the flashcards in a format that can be easily imported into a flashcard management system, such as Anki.
|
161 |
+
|
162 |
+
Remember, your goal is to help the user learn efficiently and effectively by breaking down the notebook content into manageable, repeatable flashcards."""
|
163 |
+
)
|
164 |
+
|
165 |
+
flashcards_node = functools.partial(agent_node, agent=flashcards_agent, name="FlashcardsAgent")
|
166 |
+
|
167 |
# Create Supervisor Agent
|
168 |
supervisor_agent = create_team_supervisor(
|
169 |
llm,
|
170 |
+
"You are a supervisor tasked with managing a conversation between the following agents: QAAgent, QuizAgent, FlashcardsAgent. Given the user request, decide which agent should act next.",
|
171 |
+
["QAAgent", "QuizAgent", "FlashcardsAgent"],
|
172 |
)
|
173 |
|
174 |
# Build the LangGraph
|
175 |
+
tutor_graph = StateGraph(TutorState)
|
176 |
+
tutor_graph.add_node("QAAgent", qa_node)
|
177 |
+
tutor_graph.add_node("QuizAgent", quiz_node)
|
178 |
+
tutor_graph.add_node("FlashcardsAgent", flashcards_node)
|
179 |
+
tutor_graph.add_node("supervisor", supervisor_agent)
|
180 |
+
|
181 |
+
tutor_graph.add_edge("QAAgent", "supervisor")
|
182 |
+
tutor_graph.add_edge("QuizAgent", "supervisor")
|
183 |
+
tutor_graph.add_edge("FlashcardsAgent", "supervisor")
|
184 |
+
tutor_graph.add_conditional_edges(
|
185 |
"supervisor",
|
186 |
+
lambda x: "FINISH" if x.get("quiz_created") or x.get("question_answered") or x.get("flashcards_created") else x["next"],
|
187 |
+
{"QAAgent": "QAAgent",
|
188 |
+
"QuizAgent": "QuizAgent",
|
189 |
+
"FlashcardsAgent": "FlashcardsAgent",
|
190 |
+
"FINISH": END},
|
191 |
)
|
192 |
|
193 |
+
tutor_graph.set_entry_point("supervisor")
|
194 |
+
return tutor_graph.compile()
|
{aims_tutor β notebook_tutor}/prompt_templates.py
RENAMED
File without changes
|
{aims_tutor β notebook_tutor}/retrieval.py
RENAMED
File without changes
|
notebook_tutor/states.py
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import List, TypedDict
|
2 |
+
from langchain_core.messages import BaseMessage
|
3 |
+
|
4 |
+
# Define the state for the system
|
5 |
+
class TutorState(TypedDict):
|
6 |
+
messages: List[BaseMessage]
|
7 |
+
next: str
|
8 |
+
quiz: List[dict]
|
9 |
+
quiz_created: bool
|
10 |
+
question_answered: bool
|
11 |
+
flashcards_created: bool
|
12 |
+
flashcard_filename: str
|
notebook_tutor/tools.py
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import Optional, Type
|
2 |
+
from langchain.pydantic_v1 import BaseModel, Field
|
3 |
+
from langchain.tools import BaseTool
|
4 |
+
from langchain.callbacks.manager import (
|
5 |
+
AsyncCallbackManagerForToolRun,
|
6 |
+
CallbackManagerForToolRun,
|
7 |
+
)
|
8 |
+
import csv
|
9 |
+
import uuid
|
10 |
+
import os
|
11 |
+
|
12 |
+
class FlashcardInput(BaseModel):
|
13 |
+
flashcards: list = Field(description="A list of flashcards. Each flashcard should be a dictionary with 'question' and 'answer' keys.")
|
14 |
+
|
15 |
+
class FlashcardTool(BaseTool):
|
16 |
+
name = "create_flashcards"
|
17 |
+
description = "Create flashcards in a .csv format suitable for import into Anki"
|
18 |
+
args_schema: Type[BaseModel] = FlashcardInput
|
19 |
+
|
20 |
+
def _run(
|
21 |
+
self, flashcards: list, run_manager: Optional[CallbackManagerForToolRun] = None
|
22 |
+
) -> str:
|
23 |
+
"""Use the tool to create flashcards."""
|
24 |
+
filename = f"flashcards_{uuid.uuid4()}.csv"
|
25 |
+
save_path = os.path.join('flashcards', filename) # Save in 'flashcards' directory
|
26 |
+
os.makedirs(os.path.dirname(save_path), exist_ok=True)
|
27 |
+
with open(save_path, 'w', newline='') as csvfile:
|
28 |
+
fieldnames = ['Front', 'Back']
|
29 |
+
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
|
30 |
+
|
31 |
+
writer.writeheader()
|
32 |
+
for card in flashcards:
|
33 |
+
writer.writerow({'Front': card['question'], 'Back': card['answer']})
|
34 |
+
return save_path
|
35 |
+
|
36 |
+
async def _arun(
|
37 |
+
self, flashcards: list, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
|
38 |
+
) -> str:
|
39 |
+
"""Use the tool asynchronously."""
|
40 |
+
raise NotImplementedError("create_flashcards does not support async")
|
41 |
+
|
42 |
+
# Instantiate the tool
|
43 |
+
create_flashcards_tool = FlashcardTool()
|
{aims_tutor β notebook_tutor}/utils.py
RENAMED
File without changes
|