Spaces:

JulsdL
/

DeepPDF_AI

Sleeping

JulsdL commited on May 2, 2024

Commit

d83d325

1 Parent(s): df7870c

Rebrand to DeepPDF AI with Docker support and backend enhancements

- Rebranded Chainlit to DeepPDF AI, focusing on AI-powered PDF document interaction.
- Updated documentation in chainlit.md and README.md to reflect new project scope and setup instructions.
- Added Dockerfile adjustments for improved deployment, including user permissions.
- Enhanced backend functionality in app.py with new imports and configurations.
- Updated requirements.txt to include uvicorn for ASGI support, aligning with Docker deployment strategy.
- Introduced comprehensive technical details and usage examples for interacting with PDF documents using AI.

Files changed (6) hide show

CHANGELOG.md +11 -0
Dockerfile +14 -5
README.md +29 -1
app.py +12 -10
chainlit.md +36 -8
requirements.txt +1 -1

CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,14 @@
 ## v0.1.2 (2024-05-01)
 ### Added

+## v0.1.3 (2024-05-02)
+### Added
+- Rebranded the project to DeepPDF AI, focusing on interacting with PDF documents using AI.
+- Introduced a comprehensive guide and technical details in `chainlit.md`.
+- Added Docker support for easy deployment, including Dockerfile adjustments and user permissions setup.
+- Updated `README.md` with installation, usage, and acknowledgements sections.
+- Enhanced the application's backend with new imports and configurations in `app.py`.
+- Updated `requirements.txt` to include `uvicorn` for ASGI support.
 ## v0.1.2 (2024-05-01)
 ### Added

Dockerfile CHANGED Viewed

@@ -1,11 +1,20 @@
 FROM python:3.9
-WORKDIR /code
-COPY ./requirements.txt /code/requirements.txt
-RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
-COPY . .
-CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

 FROM python:3.9
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+  PATH=/home/user/.local/bin:$PATH
+WORKDIR $HOME/app
+COPY --chown=user ./requirements.txt $HOME/app/requirements.txt
+RUN pip install --upgrade pip
+RUN pip install -r requirements.txt
+COPY --chown=user . $HOME/app
+CMD ["chainlit", "run", "app.py", "--port", "7860"]

README.md CHANGED Viewed

	@@ -1 +1,29 @@
1	- # DeepPDF

+# DeepPDF AI
+DeepPDF AI is a specialized application designed to process and interact with PDF documents using advanced AI techniques. It leverages the power of large language models to provide insightful answers to queries based on the contents of the documents. This README outlines the project structure and provides instructions on how to build and run the application.
+## Installation
+To run DeepPDF AI, follow these steps:
+1. Clone the repository to your local machine.
+2. Ensure you have Docker installed.
+3. Build the Docker image:
+   ```bash
+   docker build -t deeppdf-ai .
+   ```
+4. Run the Docker container:
+   ```bash
+   docker run -p 7860:7860 deeppdf-ai
+   ```
+## Usage
+Once the application is running, you can interact with it through a ChainLit interface at `http://localhost:7860/` by sending queries related to the PDF documents it has processed. Example questions include:
+- "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
+- "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
+## Acknowledgements
+This project uses technologies including LangChain, OpenAI's GPT models, and Qdrant for vector storage. Thanks to all open-source contributors and organizations that make these tools available.

app.py CHANGED Viewed

@@ -1,17 +1,19 @@
 import os
-from langchain_openai import ChatOpenAI
-from langchain_community.document_loaders import PyMuPDFLoader
-from langchain.text_splitter import RecursiveCharacterTextSplitter
 import tiktoken
-from langchain_openai.embeddings import OpenAIEmbeddings
-from langchain_community.vectorstores import Qdrant
-from langchain_core.prompts import ChatPromptTemplate
 from langchain.retrievers import MultiQueryRetriever
 from langchain_core.runnables import RunnablePassthrough
-from dotenv import load_dotenv
-from operator import itemgetter
-import chainlit as cl
-from chainlit.playground.providers import ChatOpenAI
 # Load environment variables
 load_dotenv()

 import os
+from operator import itemgetter
+import chainlit as cl
 import tiktoken
+from dotenv import load_dotenv
+from langchain.text_splitter import RecursiveCharacterTextSplitter
 from langchain.retrievers import MultiQueryRetriever
+from langchain_core.prompts import ChatPromptTemplate
 from langchain_core.runnables import RunnablePassthrough
+from langchain_community.document_loaders import PyMuPDFLoader
+from langchain_community.vectorstores import Qdrant
+from langchain_openai import ChatOpenAI
+from langchain_openai.embeddings import OpenAIEmbeddings
 # Load environment variables
 load_dotenv()

chainlit.md CHANGED Viewed

@@ -1,14 +1,42 @@
-# Welcome to Chainlit! 🚀🤖
-Hi there, Developer! 👋 We're excited to have you on board. Chainlit is a powerful tool designed to help you prototype, debug and share applications built on top of LLMs.
-## Useful Links 🔗
-- **Documentation:** Get started with our comprehensive [Chainlit Documentation](https://docs.chainlit.io) 📚
-- **Discord Community:** Join our friendly [Chainlit Discord](https://discord.gg/k73SQ3FyUh) to ask questions, share your projects, and connect with other developers! 💬
-We can't wait to see what you create with Chainlit! Happy coding! 💻😊
-## Welcome screen
-To modify the welcome screen, edit the `chainlit.md` file at the root of your project. If you do not want a welcome screen, just leave this file empty.

+# Welcome to DeepPDF AI! 🚀📄
+Hello there, curious minds! 🖐️ Welcome aboard the journey through the vast landscapes of chatting with your PDF documents, powered by cutting-edge artificial intelligence.
+DeepPDF AI is an innovative tool designed to enhance your understanding of and interaction with complex documents.
+## Getting Started
+To get started with DeepPDF AI:
+1. Explore the AI magic as it navigates through the Meta FORM 10-K PDF document.
+2. Experiment with different questions and watch how the AI models dig through the document to fetch information.
+3. Query about financial data, company insights, or any intriguing topics you're curious about.
+4. Sample Questions to fuel your exploration:
+   - "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
+   - "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"
+# How It Works 🧠
+- Chop and Chunk: We start by slicing and dicing your PDFs into bite-sized chunks that our AI can easily digest.
+- Semantic Smoothies: Each piece gets transformed into a flavorful semantic smoothie (a.k.a. embeddings) that represents the essence of the text.
+- Treasure Hunt: Our savvy retriever then sniffs out the most relevant chunks in response to your queries.
+- Wise Whispers: With the right context in hand, our AI cleverly crafts responses that are not just accurate but downright insightful.
+# The Tech Touch 💡🤖
+- Token Tinkering: By breaking down the text using tiktoken, we ensure our AI understands and processes each piece effectively.
+- Embedding Elixir: Powered by OpenAIEmbeddings, we turn text into searchable vectors that capture deep semantic meanings.
+- Retrieval Rodeo: Leveraging the Qdrant vector store, our system retrieves context that is as relevant as it gets.
+- MultiQuery Mastery: Our MultiQueryRetriever doesn’t just take your query at face value — it gets creative, generating three clever variations of your question to boost the chances of uncovering exactly what you need. For instance, if you ask, "Who are Meta's 'Directors'?", it spins this into:
+  1. "Can you provide a list of individuals who serve as 'Directors' at Meta, also known as members of the Board of Directors?"
+  2. "Who are the key individuals that make up Meta's Board of Directors, commonly referred to as 'Directors'?"
+  3. "Could you share information about the individuals who hold the position of 'Directors' at Meta, specifically as members of the Board of Directors?"
+By employing these tailored queries, we maximize the probability of fetching the most precise and relevant information from our vast document landscape.
+## Useful Links 🔗
+- **Documentation:** See the original document [Meta FORM 10-K](https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf) 📘
+Thank you for trying DeepPDF AI !!! 🚀📄

requirements.txt CHANGED Viewed

@@ -7,4 +7,4 @@ tiktoken==0.6.0
 pymupdf==1.24.2
 python-dotenv==1.0.1
 chainlit==0.7.700
-openai==1.24.1

 pymupdf==1.24.2
 python-dotenv==1.0.1
 chainlit==0.7.700
+uvicorn==0.23.2