Spaces:
Sleeping
Sleeping
Rebrand to DeepPDF AI with Docker support and backend enhancements
Browse files- Rebranded Chainlit to DeepPDF AI, focusing on AI-powered PDF document interaction.
- Updated documentation in chainlit.md and README.md to reflect new project scope and setup instructions.
- Added Dockerfile adjustments for improved deployment, including user permissions.
- Enhanced backend functionality in app.py with new imports and configurations.
- Updated requirements.txt to include uvicorn for ASGI support, aligning with Docker deployment strategy.
- Introduced comprehensive technical details and usage examples for interacting with PDF documents using AI.
- CHANGELOG.md +11 -0
- Dockerfile +14 -5
- README.md +29 -1
- app.py +12 -10
- chainlit.md +36 -8
- requirements.txt +1 -1
CHANGELOG.md
CHANGED
@@ -1,3 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## v0.1.2 (2024-05-01)
|
2 |
|
3 |
### Added
|
|
|
1 |
+
## v0.1.3 (2024-05-02)
|
2 |
+
|
3 |
+
### Added
|
4 |
+
|
5 |
+
- Rebranded the project to DeepPDF AI, focusing on interacting with PDF documents using AI.
|
6 |
+
- Introduced a comprehensive guide and technical details in `chainlit.md`.
|
7 |
+
- Added Docker support for easy deployment, including Dockerfile adjustments and user permissions setup.
|
8 |
+
- Updated `README.md` with installation, usage, and acknowledgements sections.
|
9 |
+
- Enhanced the application's backend with new imports and configurations in `app.py`.
|
10 |
+
- Updated `requirements.txt` to include `uvicorn` for ASGI support.
|
11 |
+
|
12 |
## v0.1.2 (2024-05-01)
|
13 |
|
14 |
### Added
|
Dockerfile
CHANGED
@@ -1,11 +1,20 @@
|
|
1 |
FROM python:3.9
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
|
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
FROM python:3.9
|
2 |
|
3 |
+
RUN useradd -m -u 1000 user
|
4 |
|
5 |
+
USER user
|
6 |
|
7 |
+
ENV HOME=/home/user \
|
8 |
+
PATH=/home/user/.local/bin:$PATH
|
9 |
|
10 |
+
WORKDIR $HOME/app
|
11 |
|
12 |
+
COPY --chown=user ./requirements.txt $HOME/app/requirements.txt
|
13 |
+
|
14 |
+
RUN pip install --upgrade pip
|
15 |
+
|
16 |
+
RUN pip install -r requirements.txt
|
17 |
+
|
18 |
+
COPY --chown=user . $HOME/app
|
19 |
+
|
20 |
+
CMD ["chainlit", "run", "app.py", "--port", "7860"]
|
README.md
CHANGED
@@ -1 +1,29 @@
|
|
1 |
-
# DeepPDF
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DeepPDF AI
|
2 |
+
|
3 |
+
DeepPDF AI is a specialized application designed to process and interact with PDF documents using advanced AI techniques. It leverages the power of large language models to provide insightful answers to queries based on the contents of the documents. This README outlines the project structure and provides instructions on how to build and run the application.
|
4 |
+
|
5 |
+
## Installation
|
6 |
+
|
7 |
+
To run DeepPDF AI, follow these steps:
|
8 |
+
|
9 |
+
1. Clone the repository to your local machine.
|
10 |
+
2. Ensure you have Docker installed.
|
11 |
+
3. Build the Docker image:
|
12 |
+
```bash
|
13 |
+
docker build -t deeppdf-ai .
|
14 |
+
```
|
15 |
+
4. Run the Docker container:
|
16 |
+
```bash
|
17 |
+
docker run -p 7860:7860 deeppdf-ai
|
18 |
+
```
|
19 |
+
|
20 |
+
## Usage
|
21 |
+
|
22 |
+
Once the application is running, you can interact with it through a ChainLit interface at `http://localhost:7860/` by sending queries related to the PDF documents it has processed. Example questions include:
|
23 |
+
|
24 |
+
- "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
|
25 |
+
- "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
|
26 |
+
|
27 |
+
## Acknowledgements
|
28 |
+
|
29 |
+
This project uses technologies including LangChain, OpenAI's GPT models, and Qdrant for vector storage. Thanks to all open-source contributors and organizations that make these tools available.
|
app.py
CHANGED
@@ -1,17 +1,19 @@
|
|
1 |
import os
|
2 |
-
from
|
3 |
-
|
4 |
-
|
5 |
import tiktoken
|
6 |
-
from
|
7 |
-
|
8 |
-
|
|
|
9 |
from langchain.retrievers import MultiQueryRetriever
|
|
|
10 |
from langchain_core.runnables import RunnablePassthrough
|
11 |
-
from
|
12 |
-
from
|
13 |
-
|
14 |
-
from
|
15 |
|
16 |
# Load environment variables
|
17 |
load_dotenv()
|
|
|
1 |
import os
|
2 |
+
from operator import itemgetter
|
3 |
+
|
4 |
+
import chainlit as cl
|
5 |
import tiktoken
|
6 |
+
from dotenv import load_dotenv
|
7 |
+
|
8 |
+
|
9 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
10 |
from langchain.retrievers import MultiQueryRetriever
|
11 |
+
from langchain_core.prompts import ChatPromptTemplate
|
12 |
from langchain_core.runnables import RunnablePassthrough
|
13 |
+
from langchain_community.document_loaders import PyMuPDFLoader
|
14 |
+
from langchain_community.vectorstores import Qdrant
|
15 |
+
from langchain_openai import ChatOpenAI
|
16 |
+
from langchain_openai.embeddings import OpenAIEmbeddings
|
17 |
|
18 |
# Load environment variables
|
19 |
load_dotenv()
|
chainlit.md
CHANGED
@@ -1,14 +1,42 @@
|
|
1 |
-
# Welcome to
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
-
-
|
8 |
-
-
|
|
|
|
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
|
13 |
|
14 |
-
|
|
|
1 |
+
# Welcome to DeepPDF AI! ππ
|
2 |
|
3 |
+
Hello there, curious minds! ποΈ Welcome aboard the journey through the vast landscapes of chatting with your PDF documents, powered by cutting-edge artificial intelligence.
|
4 |
|
5 |
+
DeepPDF AI is an innovative tool designed to enhance your understanding of and interaction with complex documents.
|
6 |
+
|
7 |
+
## Getting Started
|
8 |
+
|
9 |
+
To get started with DeepPDF AI:
|
10 |
+
|
11 |
+
1. Explore the AI magic as it navigates through the Meta FORM 10-K PDF document.
|
12 |
+
2. Experiment with different questions and watch how the AI models dig through the document to fetch information.
|
13 |
+
3. Query about financial data, company insights, or any intriguing topics you're curious about.
|
14 |
+
4. Sample Questions to fuel your exploration:
|
15 |
+
- "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
|
16 |
+
- "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
|
17 |
+
|
18 |
+
# How It Works π§
|
19 |
|
20 |
+
- Chop and Chunk: We start by slicing and dicing your PDFs into bite-sized chunks that our AI can easily digest.
|
21 |
+
- Semantic Smoothies: Each piece gets transformed into a flavorful semantic smoothie (a.k.a. embeddings) that represents the essence of the text.
|
22 |
+
- Treasure Hunt: Our savvy retriever then sniffs out the most relevant chunks in response to your queries.
|
23 |
+
- Wise Whispers: With the right context in hand, our AI cleverly crafts responses that are not just accurate but downright insightful.
|
24 |
|
25 |
+
# The Tech Touch π‘π€
|
26 |
+
|
27 |
+
- Token Tinkering: By breaking down the text using tiktoken, we ensure our AI understands and processes each piece effectively.
|
28 |
+
- Embedding Elixir: Powered by OpenAIEmbeddings, we turn text into searchable vectors that capture deep semantic meanings.
|
29 |
+
- Retrieval Rodeo: Leveraging the Qdrant vector store, our system retrieves context that is as relevant as it gets.
|
30 |
+
- MultiQuery Mastery: Our MultiQueryRetriever doesnβt just take your query at face value β it gets creative, generating three clever variations of your question to boost the chances of uncovering exactly what you need. For instance, if you ask, "Who are Meta's 'Directors'?", it spins this into:
|
31 |
+
|
32 |
+
1. "Can you provide a list of individuals who serve as 'Directors' at Meta, also known as members of the Board of Directors?"
|
33 |
+
2. "Who are the key individuals that make up Meta's Board of Directors, commonly referred to as 'Directors'?"
|
34 |
+
3. "Could you share information about the individuals who hold the position of 'Directors' at Meta, specifically as members of the Board of Directors?"
|
35 |
+
|
36 |
+
By employing these tailored queries, we maximize the probability of fetching the most precise and relevant information from our vast document landscape.
|
37 |
+
|
38 |
+
## Useful Links π
|
39 |
|
40 |
+
- **Documentation:** See the original document [Meta FORM 10-K](https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf) π
|
41 |
|
42 |
+
Thank you for trying DeepPDF AI !!! ππ
|
requirements.txt
CHANGED
@@ -7,4 +7,4 @@ tiktoken==0.6.0
|
|
7 |
pymupdf==1.24.2
|
8 |
python-dotenv==1.0.1
|
9 |
chainlit==0.7.700
|
10 |
-
|
|
|
7 |
pymupdf==1.24.2
|
8 |
python-dotenv==1.0.1
|
9 |
chainlit==0.7.700
|
10 |
+
uvicorn==0.23.2
|