JulsdL commited on
Commit
48d9af7
Β·
1 Parent(s): e3c5c37

Implement Flashcard creation tool and update project naming

Browse files

- Added FlashcardTool in notebook_tutor/tools.py for generating flashcards in CSV format, suitable for Anki import.
- Introduced FlashcardInput model to define the structure of flashcards input.
- Created TutorState in notebook_tutor/states.py to manage application states including flashcards creation status.
- Updated README.md to reflect the project's new name, AI-Notebook-Tutor, and updated instructions for running the application.
- Removed obsolete main.py

README.md CHANGED
@@ -1,8 +1,8 @@
1
- # AIMS-Tutor
2
 
3
  # RAG Application for QA in Jupyter Notebook
4
 
5
- AIMS-Tutor is designed to provide question-answering capabilities in a Jupyter Notebook using the Retrieval Augmented Generation (RAG) model. It's built on top of the LangChain and Chainlit platforms, and it uses the OpenAI API for the chat model.
6
 
7
  ## Features
8
 
@@ -28,7 +28,7 @@ OPENAI_API_KEY=your-key-here
28
  4. Run the application using the following command:
29
 
30
  ```bash
31
- chainlit run aims_tutor/app.py
32
  ```
33
 
34
  ## Usage
 
1
+ # AI-Notebook-Tutor
2
 
3
  # RAG Application for QA in Jupyter Notebook
4
 
5
+ AI-Notebook-Tutor is designed to provide question-answering capabilities in a Jupyter Notebook using the Retrieval Augmented Generation (RAG) model. It's built on top of the LangChain and Chainlit platforms, and it uses the OpenAI API for the chat model.
6
 
7
  ## Features
8
 
 
28
  4. Run the application using the following command:
29
 
30
  ```bash
31
+ chainlit run notebook_tutor/app.py
32
  ```
33
 
34
  ## Usage
flashcards/flashcards_03fb423e-2087-4598-9ebb-b99a20db0b93.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What is the first step in the retriever creation process?,The first step is loading a fine-tuned embedding model from the hub using the HuggingFaceEmbeddings class. The model used is named 'JulsdL/e2erag-arctic-m' and it is configured to run on a CUDA device.
3
+ Which class is used to load the embedding model in the retriever creation process?,The HuggingFaceEmbeddings class is used to load the embedding model.
4
+ What is the name of the embedding model used in the retriever creation process?,The embedding model used is named 'JulsdL/e2erag-arctic-m'.
5
+ On which device is the embedding model 'JulsdL/e2erag-arctic-m' configured to run?,The embedding model is configured to run on a CUDA device.
6
+ What is the purpose of setting up a VectorStore in the retriever creation process?,The VectorStore is set up to power the dense vector search and is populated with documents that have been split into chunks and embedded using the embedding model.
7
+ Which tool is used to set up the VectorStore in the retriever creation process?,Meta's FAISS (Facebook AI Similarity Search) is used to set up the VectorStore.
8
+ How are documents prepared for the VectorStore in the retriever creation process?,Documents are split into chunks and embedded using the previously loaded embedding model before being added to the VectorStore.
9
+ What is the final step in the retriever creation process?,The final step is converting the VectorStore into a retriever that can fetch relevant documents or chunks based on query embeddings.
10
+ "In the context of retriever creation, what is the purpose of converting the VectorStore to a retriever?",Converting the VectorStore to a retriever allows it to be used for fetching relevant documents or chunks based on the query embeddings.
11
+ How does a retriever enhance the model's ability in a Retrieval-Augmented Generation (RAG) setup?,"A retriever enhances the model's ability by providing contextually relevant responses based on the input queries, improving the quality and relevance of the generated content."
flashcards/flashcards_30a9e627-3d47-45fc-81db-a991269edb24.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What command is used to clone a repository from GitHub?,!git clone https://github.com/arcee-ai/DALM
3
+ How do you install a package using pip and upgrade it if necessary?,!pip install --upgrade -q -e .
4
+ "Which command is used to install the latest versions of langchain, langchain-core, langchain-community, and sentence_transformers?",!pip install -qU langchain langchain-core langchain-community sentence_transformers
5
+ How do you install the pymupdf and faiss-cpu libraries using pip?,!pip install -qU pymupdf faiss-cpu
6
+ What is the import statement for pandas?,import pandas as pd
7
+ How do you import HuggingFaceEmbeddings from the langchain_community library?,from langchain_community.embeddings import HuggingFaceEmbeddings
8
+ What is the import statement for FAISS from the langchain_community library?,from langchain_community.vectorstores import FAISS
9
+ Which import statement is used for SimpleDirectoryReader from llama_index.core?,from llama_index.core import SimpleDirectoryReader
10
+ How do you import SimpleNodeParser from the llama_index.core.node_parser module?,from llama_index.core.node_parser import SimpleNodeParser
11
+ What is the import statement for MetadataMode from llama_index.core.schema?,from llama_index.core.schema import MetadataMode
flashcards/flashcards_37136597-b97c-4889-9dc1-c3b5e10084c1.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What is the first step in the retriever creation process?,The first step in the retriever creation process is loading the embedding model.
3
+ Which model is used as the embedding model in the retriever creation process?,The embedding model used is 'JulsdL/e2erag-arctic-m'.
4
+ Which class and module are used to load the embedding model?,The embedding model is loaded using the HuggingFaceEmbeddings class from the langchain_community.embeddings module.
5
+ On which device is the embedding model configured to run?,The embedding model is configured to run on a CUDA device.
6
+ What is the purpose of the VectorStore in the retriever creation process?,The VectorStore is created to power dense vector searches.
7
+ Which technology is used to set up the VectorStore?,Meta's FAISS (Facebook AI Similarity Search) is used to set up the VectorStore.
8
+ How are the documents prepared for creating the VectorStore?,The documents are split into chunks before creating the VectorStore.
9
+ Which method is called to create the VectorStore from the documents and embedding model?,The method FAISS.from_documents is called to create the VectorStore.
10
+ How is the VectorStore converted into a retriever?,The VectorStore is converted into a retriever by invoking the as_retriever() method on the vector_store object.
11
+ What are the benefits of the retriever created through this process?,"The retriever leverages the power of dense embeddings and efficient search capabilities provided by FAISS, making it effective for retrieval-augmented tasks."
flashcards/flashcards_4dbaa0d1-6363-4584-9691-824540220571.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What is the first step in creating a retriever for a document?,The first step is loading the documents using a loader like `PyMuPDFLoader` to load documents from a PDF file.
3
+ Which function is used to load documents from a PDF file?,The function `PyMuPDFLoader` is used to load documents from a PDF file.
4
+ How are documents split into chunks for processing?,"Documents are split into chunks using the `RecursiveCharacterTextSplitter`, which divides the documents based on predefined rules like token limits and split characters."
5
+ What is the purpose of chunking documents?,Chunking documents into smaller parts helps in managing and processing them more efficiently for embedding and retrieval.
6
+ Which library is used for loading a pre-trained embedding model?,The `HuggingFaceEmbeddings` library is used for loading a pre-trained embedding model.
7
+ How do you specify the device for running the embedding model?,"You specify the device (e.g., 'cuda') in the `model_kwargs` parameter when loading the embedding model."
8
+ What is Meta's FAISS used for in the retriever creation process?,Meta's FAISS is used to power dense vector search by creating a vector store from the document chunks and embedding model.
9
+ How do you convert a vector store into a retriever?,You convert a vector store into a retriever using the `as_retriever` method.
10
+ What is the role of the retriever in the retrieval-augmented generation (RAG) system?,"The retriever fetches relevant document chunks based on query vectors, providing contextually relevant responses."
11
+ Which embedding model is used in the example provided for retriever creation?,The embedding model 'JulsdL/e2erag-arctic-m' from Hugging Face is used in the example.
flashcards/flashcards_52137f3d-8a8d-4891-9216-ab3bbc6ee66a.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What library is used to load the fine-tuned embedding model for retriever creation?,The HuggingFaceEmbeddings class is used to load the fine-tuned embedding model.
3
+ What model is loaded for embedding in the retriever creation process?,The model 'JulsdL/e2erag-arctic-m' is loaded for embedding.
4
+ On which device is the embedding model set to run?,The embedding model is set to run on a CUDA device.
5
+ Which class is used to set up the vector store in retriever creation?,The FAISS class from Meta is used to set up the vector store.
6
+ What is the purpose of the vector store in the retriever creation process?,The vector store powers the dense vector search by storing documents that have been split into chunks and embedded.
7
+ How is the vector store created in the retriever creation process?,The vector store is created from documents that have been split into chunks and embedded using the loaded embedding model.
8
+ How is the vector store converted into a retriever?,The vector store is converted into a retriever using the 'as_retriever()' method.
9
+ What is a retriever used for in the context of retriever creation?,A retriever is used to retrieve context based on a query for a language model.
10
+ What is the benefit of creating a retriever in a RAG setup?,Creating a retriever enhances the capabilities of language models by providing them with relevant context for generating responses.
11
+ What does 'RAG' stand for in the context of retriever creation?,'RAG' stands for Retrieval-Augmented Generation.
flashcards/flashcards_545f9f54-f8b0-4549-b27d-66b7303a017e.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What tool is used to load documents from a PDF file in the retriever creation process?,The PyMuPDFLoader is used to load documents from a PDF file.
3
+ What is the purpose of the RecursiveCharacterTextSplitter in the retriever creation process?,The RecursiveCharacterTextSplitter is used to divide the documents into manageable chunks based on specific rules such as token limits and preferred split characters.
4
+ Which library is used to create a dense vector store in the retriever creation process?,Meta's FAISS library is used to create a dense vector store from the document chunks and the embedding model.
5
+ How do you load a pre-trained embedding model from Hugging Face to run on a CUDA device?,Use the HuggingFaceEmbeddings class with the model name and specify the device as CUDA in the model_kwargs parameter.
6
+ What is the role of the embedding model in the retriever creation process?,The embedding model is used to convert text into vectors that can be stored in the vector store for efficient retrieval.
7
+ How do you convert a vector store into a retriever?,Use the as_retriever method on the vector store to convert it into a retriever.
8
+ What is the first step in the retriever creation process?,The first step is loading the documents using PyMuPDFLoader.
9
+ Which model name is used in the provided example for the embedding model?,"The model name used is ""JulsdL/e2erag-arctic-m""."
10
+ What is the purpose of the vector store in the retriever creation process?,"The vector store holds the dense vector representations of document chunks, allowing for efficient retrieval based on query embeddings."
11
+ Which Python import is necessary to use FAISS for creating a vector store?,You need to import FAISS from langchain_community.vectorstores.
flashcards/flashcards_af3650ed-34b5-4040-b672-036e1cf3b8e3.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Front,Back
2
+ What is the first step in creating a retriever?,The first step in creating a retriever is loading a fine-tuned embedding model from the hub using the HuggingFaceEmbeddings class.
3
+ Which class is used to load the embedding model?,The HuggingFaceEmbeddings class is used to load the embedding model.
4
+ How is the model 'JulsdL/e2erag-arctic-m' loaded onto the GPU?,"The model 'JulsdL/e2erag-arctic-m' is loaded onto the GPU by setting the model_kwargs parameter to {""device"": ""cuda""} in the HuggingFaceEmbeddings class."
5
+ What library is used to set up the vector store?,Meta's FAISS library is used to set up the vector store.
6
+ What does the vector store manage?,The vector store manages the dense vectors generated by the embedding model.
7
+ How is the vector store converted into a retriever?,The vector store is converted into a retriever using the as_retriever() method.
8
+ What is the purpose of the retriever in this context?,The purpose of the retriever is to efficiently fetch relevant document vectors based on the query vectors.
9
+ What is the role of the embedding model in the retriever creation process?,"The embedding model generates dense vectors for the documents, which are then managed by the vector store and used by the retriever to find relevant information."
10
+ Which parameter in the HuggingFaceEmbeddings class specifies the device to be used?,"The model_kwargs parameter specifies the device to be used, such as ""cuda"" for GPU."
11
+ What does the FAISS library stand for?,FAISS stands for Facebook AI Similarity Search.
main.py DELETED
@@ -1,118 +0,0 @@
1
- import os
2
- from operator import itemgetter
3
-
4
- import chainlit as cl
5
- import tiktoken
6
- from dotenv import load_dotenv
7
-
8
-
9
- from langchain.text_splitter import RecursiveCharacterTextSplitter
10
- from langchain.retrievers import MultiQueryRetriever
11
- from langchain_core.prompts import ChatPromptTemplate
12
- from langchain_core.runnables import RunnablePassthrough
13
- from langchain_community.document_loaders import PyMuPDFLoader, PythonLoader, NotebookLoader
14
- from langchain_community.vectorstores import Qdrant
15
- from langchain_openai import ChatOpenAI
16
- from langchain_openai.embeddings import OpenAIEmbeddings
17
-
18
- # Load environment variables
19
- load_dotenv()
20
-
21
- # Configuration for OpenAI
22
- OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
23
- openai_chat_model = ChatOpenAI(model="gpt-4-turbo", temperature=0.1)
24
-
25
- # Define the RAG prompt
26
- RAG_PROMPT = """
27
- CONTEXT:
28
- {context}
29
-
30
- QUERY:
31
- {question}
32
-
33
- Answer the query in a pretty format if the context is related to it; otherwise, answer: 'Sorry, I can't answer.'
34
- """
35
- rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)
36
-
37
-
38
- # ChainLit setup for chat interaction
39
- @cl.on_chat_start
40
- async def start_chat():
41
- settings = {
42
- "model": "gpt-3.5-turbo",
43
- "temperature": 0,
44
- "top_p": 1,
45
- "frequency_penalty": 0,
46
- "presence_penalty": 0,
47
- }
48
- cl.user_session.set("settings", settings)
49
-
50
- # Display a welcoming message with instructions
51
- welcome_message = "Welcome to the AIMS-Tutor! Please upload a Jupyter notebook (.ipynb and max. 5mb) to start."
52
- await cl.Message(content=welcome_message).send()
53
-
54
- # Wait for the user to upload a file
55
- files = None
56
- while files is None:
57
- files = await cl.AskFileMessage(
58
- content="Please upload a Jupyter notebook (.ipynb, max. 5mb):",
59
- accept={"application/x-ipynb+json": [".ipynb"]},
60
- max_size_mb=5
61
- ).send()
62
-
63
- file = files[0] # Get the first file
64
-
65
- if file:
66
- # Load the Jupyter notebook
67
- notebook_path = file.path # Extract the path from the AskFileResponse object
68
-
69
- loader = NotebookLoader(
70
- notebook_path,
71
- include_outputs=False,
72
- max_output_length=20,
73
- remove_newline=True,
74
- traceback=False
75
- )
76
- docs = loader.load()
77
- cl.user_session.set("docs", docs) # Store the docs in the user session
78
-
79
- # Initialize the retriever components after loading document
80
- text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50, length_function=tiktoken_len) # Initialize the text splitter
81
- split_chunks = text_splitter.split_documents(docs) # Split the documents into chunks
82
- embedding_model = OpenAIEmbeddings(model="text-embedding-3-small") # Initialize the embedding model
83
- qdrant_vectorstore = Qdrant.from_documents(split_chunks, embedding_model, location=":memory:", collection_name="Notebook") # Create a Qdrant vector store
84
- qdrant_retriever = qdrant_vectorstore.as_retriever() # Set the Qdrant vector store as a retriever
85
- multiquery_retriever = MultiQueryRetriever.from_llm(retriever=qdrant_retriever, llm=openai_chat_model, include_original=True) # Create a multi-query retriever on top of the Qdrant retriever
86
-
87
- # Store the multiquery_retriever in the user session
88
- cl.user_session.set("multiquery_retriever", multiquery_retriever)
89
-
90
-
91
- @cl.on_message
92
- async def main(message: cl.Message):
93
- # Retrieve the multi-query retriever from session
94
- multiquery_retriever = cl.user_session.get("multiquery_retriever")
95
- if not multiquery_retriever:
96
- await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
97
- return
98
-
99
- question = message.content
100
- response = handle_query(question, multiquery_retriever) # Process the question
101
-
102
- msg = cl.Message(content=response)
103
- await msg.send()
104
-
105
- def handle_query(question, retriever):
106
- # Define the retrieval augmented query-answering chain
107
- retrieval_augmented_qa_chain = (
108
- {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
109
- | RunnablePassthrough.assign(context=itemgetter("context"))
110
- | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
111
- )
112
- response = retrieval_augmented_qa_chain.invoke({"question": question})
113
- return response["response"].content
114
-
115
- # Tokenization function
116
- def tiktoken_len(text):
117
- tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(text)
118
- return len(tokens)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
{aims_tutor β†’ notebook_tutor}/__init__.py RENAMED
File without changes
{aims_tutor β†’ notebook_tutor}/app.py RENAMED
@@ -1,6 +1,6 @@
1
  import os
2
  from dotenv import load_dotenv
3
- import aims_tutor.chainlit_frontend as cl_frontend
4
 
5
  # Load environment variables
6
  load_dotenv()
 
1
  import os
2
  from dotenv import load_dotenv
3
+ import notebook_tutor.chainlit_frontend as cl_frontend
4
 
5
  # Load environment variables
6
  load_dotenv()
{aims_tutor β†’ notebook_tutor}/chainlit_frontend.py RENAMED
@@ -1,13 +1,20 @@
 
 
1
  import chainlit as cl
2
  from dotenv import load_dotenv
3
  from document_processing import DocumentManager
4
  from retrieval import RetrievalManager
5
  from langchain_core.messages import AIMessage, HumanMessage
6
- from graph import create_aims_chain, AIMSState
7
 
8
  # Load environment variables
9
  load_dotenv()
10
 
 
 
 
 
 
11
  @cl.on_chat_start
12
  async def start_chat():
13
  settings = {
@@ -18,7 +25,7 @@ async def start_chat():
18
  "presence_penalty": 0,
19
  }
20
  cl.user_session.set("settings", settings)
21
- welcome_message = "Welcome to the AIMS-Tutor! Please upload a Jupyter notebook (.ipynb and max. 5mb) to start."
22
  await cl.Message(content=welcome_message).send()
23
 
24
  files = None
@@ -42,27 +49,36 @@ async def start_chat():
42
  # Initialize LangGraph chain with the retrieval chain
43
  retrieval_chain = cl.user_session.get("retrieval_manager").get_RAG_QA_chain()
44
  cl.user_session.set("retrieval_chain", retrieval_chain)
45
- aims_chain = create_aims_chain(retrieval_chain)
46
- cl.user_session.set("aims_chain", aims_chain)
 
 
47
 
48
  @cl.on_message
49
  async def main(message: cl.Message):
50
  # Retrieve the LangGraph chain from the session
51
- aims_chain = cl.user_session.get("aims_chain")
52
 
53
- if not aims_chain:
54
  await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
55
  return
56
 
57
  # Create the initial state with the user message
58
  user_message = message.content
59
- state = AIMSState(messages=[HumanMessage(content=user_message)], next="supervisor", quiz=[], quiz_created=False, question_answered=False)
60
-
 
 
 
 
 
 
 
61
 
62
  print(f"Initial state: {state}")
63
 
64
  # Process the message through the LangGraph chain
65
- for s in aims_chain.stream(state, {"recursion_limit": 10}):
66
  print(f"State after processing: {s}")
67
 
68
  # Extract messages from the state
@@ -75,15 +91,38 @@ async def main(message: cl.Message):
75
  else:
76
  print("Error: No messages found in agent state.")
77
  else:
 
 
 
78
  # Check if the quiz was created and send it to the frontend
79
- if state["quiz_created"]:
80
- quiz_message = state["messages"][-1].content
81
  await cl.Message(content=quiz_message).send()
 
82
  # Check if a question was answered and send the response to the frontend
83
- if state["question_answered"]:
84
- qa_message = state["messages"][-1].content
85
  await cl.Message(content=qa_message).send()
86
 
87
- print("Reached end state.")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
  break
 
1
+ import os
2
+ import logging
3
  import chainlit as cl
4
  from dotenv import load_dotenv
5
  from document_processing import DocumentManager
6
  from retrieval import RetrievalManager
7
  from langchain_core.messages import AIMessage, HumanMessage
8
+ from graph import create_tutor_chain, TutorState
9
 
10
  # Load environment variables
11
  load_dotenv()
12
 
13
+ # Set up logging
14
+ logging.basicConfig(level=logging.INFO)
15
+
16
+ logger = logging.getLogger(__name__)
17
+
18
  @cl.on_chat_start
19
  async def start_chat():
20
  settings = {
 
25
  "presence_penalty": 0,
26
  }
27
  cl.user_session.set("settings", settings)
28
+ welcome_message = "Welcome to the Notebook-Tutor! Please upload a Jupyter notebook (.ipynb and max. 5mb) to start."
29
  await cl.Message(content=welcome_message).send()
30
 
31
  files = None
 
49
  # Initialize LangGraph chain with the retrieval chain
50
  retrieval_chain = cl.user_session.get("retrieval_manager").get_RAG_QA_chain()
51
  cl.user_session.set("retrieval_chain", retrieval_chain)
52
+ tutor_chain = create_tutor_chain(retrieval_chain)
53
+ cl.user_session.set("tutor_chain", tutor_chain)
54
+
55
+ logger.info("Chat started and notebook uploaded successfully.")
56
 
57
  @cl.on_message
58
  async def main(message: cl.Message):
59
  # Retrieve the LangGraph chain from the session
60
+ tutor_chain = cl.user_session.get("tutor_chain")
61
 
62
+ if not tutor_chain:
63
  await cl.Message(content="No document processing setup found. Please upload a Jupyter notebook first.").send()
64
  return
65
 
66
  # Create the initial state with the user message
67
  user_message = message.content
68
+ state = TutorState(
69
+ messages=[HumanMessage(content=user_message)],
70
+ next="supervisor",
71
+ quiz=[],
72
+ quiz_created=False,
73
+ question_answered=False,
74
+ flashcards_created=False,
75
+ flashcard_filename="",
76
+ )
77
 
78
  print(f"Initial state: {state}")
79
 
80
  # Process the message through the LangGraph chain
81
+ for s in tutor_chain.stream(state, {"recursion_limit": 10}):
82
  print(f"State after processing: {s}")
83
 
84
  # Extract messages from the state
 
91
  else:
92
  print("Error: No messages found in agent state.")
93
  else:
94
+ # Extract the final state
95
+ final_state = next(iter(s.values()))
96
+
97
  # Check if the quiz was created and send it to the frontend
98
+ if final_state.get("quiz_created"):
99
+ quiz_message = final_state["messages"][-1].content
100
  await cl.Message(content=quiz_message).send()
101
+
102
  # Check if a question was answered and send the response to the frontend
103
+ if final_state.get("question_answered"):
104
+ qa_message = final_state["messages"][-1].content
105
  await cl.Message(content=qa_message).send()
106
 
107
+ # Check if flashcards are ready and send the file to the frontend
108
+ if final_state.get("flashcards_created"):
109
+ flashcards_message = final_state["messages"][-1].content
110
+ await cl.Message(content=flashcards_message).send()
111
+
112
+ # Create a full path to the file
113
+ flashcard_filename = final_state["flashcard_filename"]
114
+ print(f"Flashcard filename: {flashcard_filename}")
115
+ flashcard_path = os.path.abspath(flashcard_filename)
116
+ print(f"Flashcard path: {flashcard_path}")
117
+
118
+ # Use the File class to send the file
119
+ file_element = cl.File(name=os.path.basename(flashcard_filename), path=flashcard_path)
120
+ print(f"Sending flashcards file: {file_element}")
121
+ await cl.Message(
122
+ content="Here are your flashcards:",
123
+ elements=[file_element]
124
+ ).send()
125
+
126
+ print("Reached END state.")
127
 
128
  break
{aims_tutor β†’ notebook_tutor}/document_processing.py RENAMED
@@ -6,7 +6,7 @@ from langchain.retrievers import MultiQueryRetriever
6
  from langchain_openai.embeddings import OpenAIEmbeddings
7
  from langchain_openai import ChatOpenAI
8
  from dotenv import load_dotenv
9
- from aims_tutor.utils import tiktoken_len
10
 
11
  # Load environment variables
12
  load_dotenv()
 
6
  from langchain_openai.embeddings import OpenAIEmbeddings
7
  from langchain_openai import ChatOpenAI
8
  from dotenv import load_dotenv
9
+ from notebook_tutor.utils import tiktoken_len
10
 
11
  # Load environment variables
12
  load_dotenv()
{aims_tutor β†’ notebook_tutor}/graph.py RENAMED
@@ -1,12 +1,14 @@
1
- from typing import Annotated, List, TypedDict
2
  from dotenv import load_dotenv
3
  from langchain_core.tools import tool
4
  from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
5
- from langchain_core.messages import AIMessage, BaseMessage
6
  from langchain.agents import AgentExecutor, create_openai_functions_agent
7
  from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
8
  from langchain_openai import ChatOpenAI
9
  from langgraph.graph import END, StateGraph
 
 
10
  import functools
11
 
12
  # Load environment variables
@@ -32,6 +34,9 @@ def get_retrieve_information_tool(retrieval_chain):
32
  wrapper_instance = RetrievalChainWrapper(retrieval_chain)
33
  return tool(wrapper_instance.retrieve_information)
34
 
 
 
 
35
  # Function to create agents
36
  def create_agent(
37
  llm: ChatOpenAI,
@@ -60,20 +65,24 @@ def create_agent(
60
  # Function to create agent nodes
61
  def agent_node(state, agent, name):
62
  result = agent.invoke(state)
63
- if 'messages' not in result: # Check if messages are present in the agent state
64
  raise ValueError(f"No messages found in agent state: {result}")
65
  new_state = {"messages": state["messages"] + [AIMessage(content=result["output"], name=name)]}
66
- if "next" in result:
67
- new_state["next"] = result["next"]
68
- if name == "QuizAgent" and "quiz_created" in state and not state["quiz_created"]:
69
  new_state["quiz_created"] = True
70
- new_state["next"] = "FINISH" # Finish the conversation after the quiz is created and wait for a new user input
71
- if name == "QAAgent":
72
  new_state["question_answered"] = True
73
- new_state["next"] = "question_answered"
 
 
 
 
74
  return new_state
75
 
76
 
 
77
  # Function to create the supervisor
78
  def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExecutor:
79
  """An LLM-based router."""
@@ -112,17 +121,9 @@ def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExec
112
  | JsonOutputFunctionsParser()
113
  )
114
 
115
- # Define the state for the system
116
- class AIMSState(TypedDict):
117
- messages: List[BaseMessage]
118
- next: str
119
- quiz: List[dict]
120
- quiz_created: bool
121
- question_answered: bool
122
-
123
 
124
  # Create the LangGraph chain
125
- def create_aims_chain(retrieval_chain):
126
 
127
  retrieve_information_tool = get_retrieve_information_tool(retrieval_chain)
128
 
@@ -139,37 +140,55 @@ def create_aims_chain(retrieval_chain):
139
  quiz_agent = create_agent(
140
  llm,
141
  [retrieve_information_tool],
142
- "You are a quiz creator that generates quizzes based on the provided notebook content."
143
-
144
- """First, You MUST Use the retrieval_inforation_tool to gather context from the notebook to gather relevant and accurate information.
145
-
146
  Next, create a 5-question quiz based on the information you have gathered. Include the answers at the end of the quiz.
147
-
148
  Present the quiz to the user in a clear and concise manner."""
149
  )
150
 
151
  quiz_node = functools.partial(agent_node, agent=quiz_agent, name="QuizAgent")
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  # Create Supervisor Agent
154
  supervisor_agent = create_team_supervisor(
155
  llm,
156
- "You are a supervisor tasked with managing a conversation between the following agents: QAAgent, QuizAgent. Given the user request, decide which agent should act next.",
157
- ["QAAgent", "QuizAgent"],
158
  )
159
 
160
  # Build the LangGraph
161
- aims_graph = StateGraph(AIMSState)
162
- aims_graph.add_node("QAAgent", qa_node)
163
- aims_graph.add_node("QuizAgent", quiz_node)
164
- aims_graph.add_node("supervisor", supervisor_agent)
165
-
166
- aims_graph.add_edge("QAAgent", "supervisor")
167
- aims_graph.add_edge("QuizAgent", "supervisor")
168
- aims_graph.add_conditional_edges(
 
 
169
  "supervisor",
170
- lambda x: "FINISH" if x.get("quiz_created") else ("FINISH" if x.get("question_answered") else x["next"]),
171
- {"QAAgent": "QAAgent", "QuizAgent": "QuizAgent", "WAIT": END, "FINISH": END, "question_answered": END},
 
 
 
172
  )
173
 
174
- aims_graph.set_entry_point("supervisor")
175
- return aims_graph.compile()
 
1
+ from typing import Annotated
2
  from dotenv import load_dotenv
3
  from langchain_core.tools import tool
4
  from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
5
+ from langchain_core.messages import AIMessage
6
  from langchain.agents import AgentExecutor, create_openai_functions_agent
7
  from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
8
  from langchain_openai import ChatOpenAI
9
  from langgraph.graph import END, StateGraph
10
+ from tools import create_flashcards_tool
11
+ from states import TutorState
12
  import functools
13
 
14
  # Load environment variables
 
34
  wrapper_instance = RetrievalChainWrapper(retrieval_chain)
35
  return tool(wrapper_instance.retrieve_information)
36
 
37
+ # Instantiate the tools
38
+ flashcard_tool = create_flashcards_tool
39
+
40
  # Function to create agents
41
  def create_agent(
42
  llm: ChatOpenAI,
 
65
  # Function to create agent nodes
66
  def agent_node(state, agent, name):
67
  result = agent.invoke(state)
68
+ if 'messages' not in result:
69
  raise ValueError(f"No messages found in agent state: {result}")
70
  new_state = {"messages": state["messages"] + [AIMessage(content=result["output"], name=name)]}
71
+
72
+ # Set the appropriate flags and next state
73
+ if name == "QuizAgent":
74
  new_state["quiz_created"] = True
75
+ elif name == "QAAgent":
 
76
  new_state["question_answered"] = True
77
+ elif name == "FlashcardsAgent":
78
+ new_state["flashcards_created"] = True
79
+ new_state["flashcard_filename"] = result["output"].split('(')[-1].strip(')')
80
+
81
+ new_state["next"] = "FINISH"
82
  return new_state
83
 
84
 
85
+
86
  # Function to create the supervisor
87
  def create_team_supervisor(llm: ChatOpenAI, system_prompt, members) -> AgentExecutor:
88
  """An LLM-based router."""
 
121
  | JsonOutputFunctionsParser()
122
  )
123
 
 
 
 
 
 
 
 
 
124
 
125
  # Create the LangGraph chain
126
+ def create_tutor_chain(retrieval_chain):
127
 
128
  retrieve_information_tool = get_retrieve_information_tool(retrieval_chain)
129
 
 
140
  quiz_agent = create_agent(
141
  llm,
142
  [retrieve_information_tool],
143
+ """You are a quiz creator that generates quizzes based on the provided notebook content.
144
+ First, You MUST Use the retrieval_inforation_tool to gather context from the notebook to gather relevant and accurate information.
 
 
145
  Next, create a 5-question quiz based on the information you have gathered. Include the answers at the end of the quiz.
 
146
  Present the quiz to the user in a clear and concise manner."""
147
  )
148
 
149
  quiz_node = functools.partial(agent_node, agent=quiz_agent, name="QuizAgent")
150
 
151
+ # Create Flashcards Agent
152
+ flashcards_agent = create_agent(
153
+ llm,
154
+ [retrieve_information_tool, flashcard_tool],
155
+ """
156
+ You are the Flashcard creator. Your mission is to create effective and concise flashcards based on the user's query and the content of the provided notebook. Your role involves the following tasks:
157
+ 1. Analyze User Query: Understand the user's request and determine the key concepts and information they need to learn.
158
+ 2. Search Notebook Content: Use the notebook content to gather relevant information and generate accurate and informative flashcards.
159
+ 3. Generate Flashcards: Create a series of flashcards content with clear questions on the front and detailed answers on the back. Ensure that the flashcards cover the essential points and concepts requested by the user.
160
+ 4. Export Flashcards: Use the flashcard_tool to create and export the flashcards in a format that can be easily imported into a flashcard management system, such as Anki.
161
+
162
+ Remember, your goal is to help the user learn efficiently and effectively by breaking down the notebook content into manageable, repeatable flashcards."""
163
+ )
164
+
165
+ flashcards_node = functools.partial(agent_node, agent=flashcards_agent, name="FlashcardsAgent")
166
+
167
  # Create Supervisor Agent
168
  supervisor_agent = create_team_supervisor(
169
  llm,
170
+ "You are a supervisor tasked with managing a conversation between the following agents: QAAgent, QuizAgent, FlashcardsAgent. Given the user request, decide which agent should act next.",
171
+ ["QAAgent", "QuizAgent", "FlashcardsAgent"],
172
  )
173
 
174
  # Build the LangGraph
175
+ tutor_graph = StateGraph(TutorState)
176
+ tutor_graph.add_node("QAAgent", qa_node)
177
+ tutor_graph.add_node("QuizAgent", quiz_node)
178
+ tutor_graph.add_node("FlashcardsAgent", flashcards_node)
179
+ tutor_graph.add_node("supervisor", supervisor_agent)
180
+
181
+ tutor_graph.add_edge("QAAgent", "supervisor")
182
+ tutor_graph.add_edge("QuizAgent", "supervisor")
183
+ tutor_graph.add_edge("FlashcardsAgent", "supervisor")
184
+ tutor_graph.add_conditional_edges(
185
  "supervisor",
186
+ lambda x: "FINISH" if x.get("quiz_created") or x.get("question_answered") or x.get("flashcards_created") else x["next"],
187
+ {"QAAgent": "QAAgent",
188
+ "QuizAgent": "QuizAgent",
189
+ "FlashcardsAgent": "FlashcardsAgent",
190
+ "FINISH": END},
191
  )
192
 
193
+ tutor_graph.set_entry_point("supervisor")
194
+ return tutor_graph.compile()
{aims_tutor β†’ notebook_tutor}/prompt_templates.py RENAMED
File without changes
{aims_tutor β†’ notebook_tutor}/retrieval.py RENAMED
File without changes
notebook_tutor/states.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, TypedDict
2
+ from langchain_core.messages import BaseMessage
3
+
4
+ # Define the state for the system
5
+ class TutorState(TypedDict):
6
+ messages: List[BaseMessage]
7
+ next: str
8
+ quiz: List[dict]
9
+ quiz_created: bool
10
+ question_answered: bool
11
+ flashcards_created: bool
12
+ flashcard_filename: str
notebook_tutor/tools.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Optional, Type
2
+ from langchain.pydantic_v1 import BaseModel, Field
3
+ from langchain.tools import BaseTool
4
+ from langchain.callbacks.manager import (
5
+ AsyncCallbackManagerForToolRun,
6
+ CallbackManagerForToolRun,
7
+ )
8
+ import csv
9
+ import uuid
10
+ import os
11
+
12
+ class FlashcardInput(BaseModel):
13
+ flashcards: list = Field(description="A list of flashcards. Each flashcard should be a dictionary with 'question' and 'answer' keys.")
14
+
15
+ class FlashcardTool(BaseTool):
16
+ name = "create_flashcards"
17
+ description = "Create flashcards in a .csv format suitable for import into Anki"
18
+ args_schema: Type[BaseModel] = FlashcardInput
19
+
20
+ def _run(
21
+ self, flashcards: list, run_manager: Optional[CallbackManagerForToolRun] = None
22
+ ) -> str:
23
+ """Use the tool to create flashcards."""
24
+ filename = f"flashcards_{uuid.uuid4()}.csv"
25
+ save_path = os.path.join('flashcards', filename) # Save in 'flashcards' directory
26
+ os.makedirs(os.path.dirname(save_path), exist_ok=True)
27
+ with open(save_path, 'w', newline='') as csvfile:
28
+ fieldnames = ['Front', 'Back']
29
+ writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
30
+
31
+ writer.writeheader()
32
+ for card in flashcards:
33
+ writer.writerow({'Front': card['question'], 'Back': card['answer']})
34
+ return save_path
35
+
36
+ async def _arun(
37
+ self, flashcards: list, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
38
+ ) -> str:
39
+ """Use the tool asynchronously."""
40
+ raise NotImplementedError("create_flashcards does not support async")
41
+
42
+ # Instantiate the tool
43
+ create_flashcards_tool = FlashcardTool()
{aims_tutor β†’ notebook_tutor}/utils.py RENAMED
File without changes