# DeepResearch [Demo](https://jina.ai/deepsearch#demo) | [API](#official-api) | [Evaluation](#evaluation) Keep searching, reading webpages, reasoning until an answer is found (or the token budget is exceeded). Useful for deeply investigating a query. ```mermaid --- config: theme: mc look: handDrawn --- flowchart LR subgraph Loop["until budget exceed"] direction LR Search["Search"] Read["Read"] Reason["Reason"] end Query(["Query"]) --> Loop Search --> Read Read --> Reason Reason --> Search Loop --> Answer(["Answer"]) ``` Unlike OpenAI and Gemini's Deep Research capabilities, we focus solely on **delivering accurate answers through our iterative process**. We don't optimize for long-form articles – if you need quick, precise answers from deep search, you're in the right place. If you're looking for AI-generated reports like OpenAI/Gemini do, this isn't for you. ## Install ```bash git clone https://github.com/jina-ai/node-DeepResearch.git cd node-DeepResearch npm install ``` [安装部署视频教程 on Youtube](https://youtu.be/vrpraFiPUyA) It is also available on npm but not recommended for now, as the code is still under active development. ## Usage We use Gemini (latest `gemini-2.0-flash`) / OpenAI / [LocalLLM](#use-local-llm) for reasoning, [Jina Reader](https://jina.ai/reader) for searching and reading webpages, you can get a free API key with 1M tokens from jina.ai. ```bash export GEMINI_API_KEY=... # for gemini # export OPENAI_API_KEY=... # for openai # export LLM_PROVIDER=openai # for openai export JINA_API_KEY=jina_... # free jina api key, get from https://jina.ai/reader npm run dev $QUERY ``` ### Official API You can also use our official DeepSearch API, hosted and optimized by Jina AI: ``` https://deepsearch.jina.ai/v1/chat/completions ``` You can use it with any OpenAI-compatible client. For the authentication Bearer, get your Jina API key from https://jina.ai #### Client integration guidelines If you are building a web/local/mobile client that uses `Jina DeepSearch API`, here are some design guidelines: - Our API is fully compatible with [OpenAI API schema](https://platform.openai.com/docs/api-reference/chat/create), this should greatly simplify the integration process. The model name is `jina-deepsearch-v1`. - Our DeepSearch API is a reasoning+search grounding LLM, so it's best for questions that require deep reasoning and search. - Two special tokens are introduced `...`, `...`. Please render them with care. - Guide the user to get a Jina API key from https://jina.ai, with 1M free tokens for new API key. - There are rate limits, [between 10RPM to 30RPM depending on the API key tier](https://jina.ai/contact-sales#rate-limit). - [Download Jina AI logo here](https://jina.ai/logo-Jina-1024.zip) ## Demo > was recorded with `gemini-1.5-flash`, the latest `gemini-2.0-flash` leads to much better results! Query: `"what is the latest blog post's title from jina ai?"` 3 steps; answer is correct! ![demo1](.github/visuals/demo.gif) Query: `"what is the context length of readerlm-v2?"` 2 steps; answer is correct! ![demo1](.github/visuals/demo3.gif) Query: `"list all employees from jina ai that u can find, as many as possible"` 11 steps; partially correct! but im not in the list :( ![demo1](.github/visuals/demo2.gif) Query: `"who will be the biggest competitor of Jina AI"` 42 steps; future prediction kind, so it's arguably correct! atm Im not seeing `weaviate` as a competitor, but im open for the future "i told you so" moment. ![demo1](.github/visuals/demo4.gif) More examples: ``` # example: no tool calling npm run dev "1+1=" npm run dev "what is the capital of France?" # example: 2-step npm run dev "what is the latest news from Jina AI?" # example: 3-step npm run dev "what is the twitter account of jina ai's founder" # example: 13-step, ambiguious question (no def of "big") npm run dev "who is bigger? cohere, jina ai, voyage?" # example: open question, research-like, long chain of thoughts npm run dev "who will be president of US in 2028?" npm run dev "what should be jina ai strategy for 2025?" ``` ## Use Local LLM > Note, not every LLM works with our reasoning flow, we need those who support structured output (sometimes called JSON Schema output, object output) well. Feel free to purpose a PR to add more open-source LLMs to the working list. If you use Ollama or LMStudio, you can redirect the reasoning request to your local LLM by setting the following environment variables: ```bash export LLM_PROVIDER=openai # yes, that's right - for local llm we still use openai client export OPENAI_BASE_URL=http://127.0.0.1:1234/v1 # your local llm endpoint export OPENAI_API_KEY=whatever # random string would do, as we don't use it (unless your local LLM has authentication) export DEFAULT_MODEL_NAME=qwen2.5-7b # your local llm model name ``` ## OpenAI-Compatible Server API If you have a GUI client that supports OpenAI API (e.g. [CherryStudio](https://docs.cherry-ai.com/), [Chatbox](https://github.com/Bin-Huang/chatbox)) , you can simply config it to use this server. ![demo1](.github/visuals/demo6.gif) Start the server: ```bash # Without authentication npm run serve # With authentication (clients must provide this secret as Bearer token) npm run serve --secret=your_secret_token ``` The server will start on http://localhost:3000 with the following endpoint: ### POST /v1/chat/completions ```bash # Without authentication curl http://localhost:3000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "jina-deepsearch-v1", "messages": [ { "role": "user", "content": "Hello!" } ] }' # With authentication (when server is started with --secret) curl http://localhost:3000/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your_secret_token" \ -d '{ "model": "jina-deepsearch-v1", "messages": [ { "role": "user", "content": "Hello!" } ], "stream": true }' ``` Response format: ```json { "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "jina-deepsearch-v1", "system_fingerprint": "fp_44709d6fcb", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "YOUR FINAL ANSWER" }, "logprobs": null, "finish_reason": "stop" }], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 } } ``` For streaming responses (stream: true), the server sends chunks in this format: ```json { "id": "chatcmpl-123", "object": "chat.completion.chunk", "created": 1694268190, "model": "jina-deepsearch-v1", "system_fingerprint": "fp_44709d6fcb", "choices": [{ "index": 0, "delta": { "content": "..." }, "logprobs": null, "finish_reason": null }] } ``` Note: The think content in streaming responses is wrapped in XML tags: ``` [thinking steps...] [final answer] ``` ## Docker Setup ### Build Docker Image To build the Docker image for the application, run the following command: ```bash docker build -t deepresearch:latest . ``` ### Run Docker Container To run the Docker container, use the following command: ```bash docker run -p 3000:3000 --env GEMINI_API_KEY=your_gemini_api_key --env JINA_API_KEY=your_jina_api_key deepresearch:latest ``` ### Docker Compose You can also use Docker Compose to manage multi-container applications. To start the application with Docker Compose, run: ```bash docker-compose up ``` ## How Does it Work? Not sure a flowchart helps, but here it is: ```mermaid flowchart TD Start([Start]) --> Init[Initialize context & variables] Init --> CheckBudget{Token budget
exceeded?} CheckBudget -->|No| GetQuestion[Get current question
from gaps] CheckBudget -->|Yes| BeastMode[Enter Beast Mode] GetQuestion --> GenPrompt[Generate prompt] GenPrompt --> ModelGen[Generate response
using Gemini] ModelGen --> ActionCheck{Check action
type} ActionCheck -->|answer| AnswerCheck{Is original
question?} AnswerCheck -->|Yes| EvalAnswer[Evaluate answer] EvalAnswer --> IsGoodAnswer{Is answer
definitive?} IsGoodAnswer -->|Yes| HasRefs{Has
references?} HasRefs -->|Yes| End([End]) HasRefs -->|No| GetQuestion IsGoodAnswer -->|No| StoreBad[Store bad attempt
Reset context] StoreBad --> GetQuestion AnswerCheck -->|No| StoreKnowledge[Store as intermediate
knowledge] StoreKnowledge --> GetQuestion ActionCheck -->|reflect| ProcessQuestions[Process new
sub-questions] ProcessQuestions --> DedupQuestions{New unique
questions?} DedupQuestions -->|Yes| AddGaps[Add to gaps queue] DedupQuestions -->|No| DisableReflect[Disable reflect
for next step] AddGaps --> GetQuestion DisableReflect --> GetQuestion ActionCheck -->|search| SearchQuery[Execute search] SearchQuery --> NewURLs{New URLs
found?} NewURLs -->|Yes| StoreURLs[Store URLs for
future visits] NewURLs -->|No| DisableSearch[Disable search
for next step] StoreURLs --> GetQuestion DisableSearch --> GetQuestion ActionCheck -->|visit| VisitURLs[Visit URLs] VisitURLs --> NewContent{New content
found?} NewContent -->|Yes| StoreContent[Store content as
knowledge] NewContent -->|No| DisableVisit[Disable visit
for next step] StoreContent --> GetQuestion DisableVisit --> GetQuestion BeastMode --> FinalAnswer[Generate final answer] --> End ``` ## Evaluation I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./src/evals/ego-questions.json) for evaluation. These are the questions about Jina AI that I know 100% the answer but LLMs do not. I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer. ```bash npm run eval ./src/evals/questions.json ``` Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set. Plain `gemini-2.0-flash` can be run by setting `tokenBudget` to zero, skipping the while-loop and directly answering the question. It should not be surprised that plain `gemini-2.0-flash` has a 0% pass rate, as I intentionally filtered out the questions that LLMs can answer. | Metric | gemini-2.0-flash | #188f1bb | |--------|------------------|----------| | Pass Rate | 0% | 75% | | Average Steps | 1 | 4 | | Maximum Steps | 1 | 13 | | Minimum Steps | 1 | 2 | | Median Steps | 1 | 3 | | Average Tokens | 428 | 68,574 | | Median Tokens | 434 | 31,541 | | Maximum Tokens | 463 | 363,655 | | Minimum Tokens | 374 | 7,963 |