JulsdL commited on
Commit
d83d325
Β·
1 Parent(s): df7870c

Rebrand to DeepPDF AI with Docker support and backend enhancements

Browse files

- Rebranded Chainlit to DeepPDF AI, focusing on AI-powered PDF document interaction.
- Updated documentation in chainlit.md and README.md to reflect new project scope and setup instructions.
- Added Dockerfile adjustments for improved deployment, including user permissions.
- Enhanced backend functionality in app.py with new imports and configurations.
- Updated requirements.txt to include uvicorn for ASGI support, aligning with Docker deployment strategy.
- Introduced comprehensive technical details and usage examples for interacting with PDF documents using AI.

Files changed (6) hide show
  1. CHANGELOG.md +11 -0
  2. Dockerfile +14 -5
  3. README.md +29 -1
  4. app.py +12 -10
  5. chainlit.md +36 -8
  6. requirements.txt +1 -1
CHANGELOG.md CHANGED
@@ -1,3 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
1
  ## v0.1.2 (2024-05-01)
2
 
3
  ### Added
 
1
+ ## v0.1.3 (2024-05-02)
2
+
3
+ ### Added
4
+
5
+ - Rebranded the project to DeepPDF AI, focusing on interacting with PDF documents using AI.
6
+ - Introduced a comprehensive guide and technical details in `chainlit.md`.
7
+ - Added Docker support for easy deployment, including Dockerfile adjustments and user permissions setup.
8
+ - Updated `README.md` with installation, usage, and acknowledgements sections.
9
+ - Enhanced the application's backend with new imports and configurations in `app.py`.
10
+ - Updated `requirements.txt` to include `uvicorn` for ASGI support.
11
+
12
  ## v0.1.2 (2024-05-01)
13
 
14
  ### Added
Dockerfile CHANGED
@@ -1,11 +1,20 @@
1
  FROM python:3.9
2
 
3
- WORKDIR /code
4
 
5
- COPY ./requirements.txt /code/requirements.txt
6
 
7
- RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
 
8
 
9
- COPY . .
10
 
11
- CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
 
 
 
 
 
 
 
 
1
  FROM python:3.9
2
 
3
+ RUN useradd -m -u 1000 user
4
 
5
+ USER user
6
 
7
+ ENV HOME=/home/user \
8
+ PATH=/home/user/.local/bin:$PATH
9
 
10
+ WORKDIR $HOME/app
11
 
12
+ COPY --chown=user ./requirements.txt $HOME/app/requirements.txt
13
+
14
+ RUN pip install --upgrade pip
15
+
16
+ RUN pip install -r requirements.txt
17
+
18
+ COPY --chown=user . $HOME/app
19
+
20
+ CMD ["chainlit", "run", "app.py", "--port", "7860"]
README.md CHANGED
@@ -1 +1,29 @@
1
- # DeepPDF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepPDF AI
2
+
3
+ DeepPDF AI is a specialized application designed to process and interact with PDF documents using advanced AI techniques. It leverages the power of large language models to provide insightful answers to queries based on the contents of the documents. This README outlines the project structure and provides instructions on how to build and run the application.
4
+
5
+ ## Installation
6
+
7
+ To run DeepPDF AI, follow these steps:
8
+
9
+ 1. Clone the repository to your local machine.
10
+ 2. Ensure you have Docker installed.
11
+ 3. Build the Docker image:
12
+ ```bash
13
+ docker build -t deeppdf-ai .
14
+ ```
15
+ 4. Run the Docker container:
16
+ ```bash
17
+ docker run -p 7860:7860 deeppdf-ai
18
+ ```
19
+
20
+ ## Usage
21
+
22
+ Once the application is running, you can interact with it through a ChainLit interface at `http://localhost:7860/` by sending queries related to the PDF documents it has processed. Example questions include:
23
+
24
+ - "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
25
+ - "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
26
+
27
+ ## Acknowledgements
28
+
29
+ This project uses technologies including LangChain, OpenAI's GPT models, and Qdrant for vector storage. Thanks to all open-source contributors and organizations that make these tools available.
app.py CHANGED
@@ -1,17 +1,19 @@
1
  import os
2
- from langchain_openai import ChatOpenAI
3
- from langchain_community.document_loaders import PyMuPDFLoader
4
- from langchain.text_splitter import RecursiveCharacterTextSplitter
5
  import tiktoken
6
- from langchain_openai.embeddings import OpenAIEmbeddings
7
- from langchain_community.vectorstores import Qdrant
8
- from langchain_core.prompts import ChatPromptTemplate
 
9
  from langchain.retrievers import MultiQueryRetriever
 
10
  from langchain_core.runnables import RunnablePassthrough
11
- from dotenv import load_dotenv
12
- from operator import itemgetter
13
- import chainlit as cl
14
- from chainlit.playground.providers import ChatOpenAI
15
 
16
  # Load environment variables
17
  load_dotenv()
 
1
  import os
2
+ from operator import itemgetter
3
+
4
+ import chainlit as cl
5
  import tiktoken
6
+ from dotenv import load_dotenv
7
+
8
+
9
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
10
  from langchain.retrievers import MultiQueryRetriever
11
+ from langchain_core.prompts import ChatPromptTemplate
12
  from langchain_core.runnables import RunnablePassthrough
13
+ from langchain_community.document_loaders import PyMuPDFLoader
14
+ from langchain_community.vectorstores import Qdrant
15
+ from langchain_openai import ChatOpenAI
16
+ from langchain_openai.embeddings import OpenAIEmbeddings
17
 
18
  # Load environment variables
19
  load_dotenv()
chainlit.md CHANGED
@@ -1,14 +1,42 @@
1
- # Welcome to Chainlit! πŸš€πŸ€–
2
 
3
- Hi there, Developer! πŸ‘‹ We're excited to have you on board. Chainlit is a powerful tool designed to help you prototype, debug and share applications built on top of LLMs.
4
 
5
- ## Useful Links πŸ”—
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- - **Documentation:** Get started with our comprehensive [Chainlit Documentation](https://docs.chainlit.io) πŸ“š
8
- - **Discord Community:** Join our friendly [Chainlit Discord](https://discord.gg/k73SQ3FyUh) to ask questions, share your projects, and connect with other developers! πŸ’¬
 
 
9
 
10
- We can't wait to see what you create with Chainlit! Happy coding! πŸ’»πŸ˜Š
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- ## Welcome screen
13
 
14
- To modify the welcome screen, edit the `chainlit.md` file at the root of your project. If you do not want a welcome screen, just leave this file empty.
 
1
+ # Welcome to DeepPDF AI! πŸš€πŸ“„
2
 
3
+ Hello there, curious minds! πŸ–οΈ Welcome aboard the journey through the vast landscapes of chatting with your PDF documents, powered by cutting-edge artificial intelligence.
4
 
5
+ DeepPDF AI is an innovative tool designed to enhance your understanding of and interaction with complex documents.
6
+
7
+ ## Getting Started
8
+
9
+ To get started with DeepPDF AI:
10
+
11
+ 1. Explore the AI magic as it navigates through the Meta FORM 10-K PDF document.
12
+ 2. Experiment with different questions and watch how the AI models dig through the document to fetch information.
13
+ 3. Query about financial data, company insights, or any intriguing topics you're curious about.
14
+ 4. Sample Questions to fuel your exploration:
15
+ - "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
16
+ - "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
17
+
18
+ # How It Works 🧠
19
 
20
+ - Chop and Chunk: We start by slicing and dicing your PDFs into bite-sized chunks that our AI can easily digest.
21
+ - Semantic Smoothies: Each piece gets transformed into a flavorful semantic smoothie (a.k.a. embeddings) that represents the essence of the text.
22
+ - Treasure Hunt: Our savvy retriever then sniffs out the most relevant chunks in response to your queries.
23
+ - Wise Whispers: With the right context in hand, our AI cleverly crafts responses that are not just accurate but downright insightful.
24
 
25
+ # The Tech Touch πŸ’‘πŸ€–
26
+
27
+ - Token Tinkering: By breaking down the text using tiktoken, we ensure our AI understands and processes each piece effectively.
28
+ - Embedding Elixir: Powered by OpenAIEmbeddings, we turn text into searchable vectors that capture deep semantic meanings.
29
+ - Retrieval Rodeo: Leveraging the Qdrant vector store, our system retrieves context that is as relevant as it gets.
30
+ - MultiQuery Mastery: Our MultiQueryRetriever doesn’t just take your query at face value β€” it gets creative, generating three clever variations of your question to boost the chances of uncovering exactly what you need. For instance, if you ask, "Who are Meta's 'Directors'?", it spins this into:
31
+
32
+ 1. "Can you provide a list of individuals who serve as 'Directors' at Meta, also known as members of the Board of Directors?"
33
+ 2. "Who are the key individuals that make up Meta's Board of Directors, commonly referred to as 'Directors'?"
34
+ 3. "Could you share information about the individuals who hold the position of 'Directors' at Meta, specifically as members of the Board of Directors?"
35
+
36
+ By employing these tailored queries, we maximize the probability of fetching the most precise and relevant information from our vast document landscape.
37
+
38
+ ## Useful Links πŸ”—
39
 
40
+ - **Documentation:** See the original document [Meta FORM 10-K](https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf) πŸ“˜
41
 
42
+ Thank you for trying DeepPDF AI !!! πŸš€πŸ“„
requirements.txt CHANGED
@@ -7,4 +7,4 @@ tiktoken==0.6.0
7
  pymupdf==1.24.2
8
  python-dotenv==1.0.1
9
  chainlit==0.7.700
10
- openai==1.24.1
 
7
  pymupdf==1.24.2
8
  python-dotenv==1.0.1
9
  chainlit==0.7.700
10
+ uvicorn==0.23.2