# DeepResearch
[Demo](https://jina.ai/deepsearch#demo) | [API](#official-api) | [Evaluation](#evaluation)
Keep searching, reading webpages, reasoning until an answer is found (or the token budget is exceeded). Useful for deeply investigating a query.
```mermaid
---
config:
theme: mc
look: handDrawn
---
flowchart LR
subgraph Loop["until budget exceed"]
direction LR
Search["Search"]
Read["Read"]
Reason["Reason"]
end
Query(["Query"]) --> Loop
Search --> Read
Read --> Reason
Reason --> Search
Loop --> Answer(["Answer"])
```
Unlike OpenAI and Gemini's Deep Research capabilities, we focus solely on **delivering accurate answers through our iterative process**. We don't optimize for long-form articles – if you need quick, precise answers from deep search, you're in the right place. If you're looking for AI-generated reports like OpenAI/Gemini do, this isn't for you.
## Install
```bash
git clone https://github.com/jina-ai/node-DeepResearch.git
cd node-DeepResearch
npm install
```
[安装部署视频教程 on Youtube](https://youtu.be/vrpraFiPUyA)
It is also available on npm but not recommended for now, as the code is still under active development.
## Usage
We use Gemini (latest `gemini-2.0-flash`) / OpenAI / [LocalLLM](#use-local-llm) for reasoning, [Jina Reader](https://jina.ai/reader) for searching and reading webpages, you can get a free API key with 1M tokens from jina.ai.
```bash
export GEMINI_API_KEY=... # for gemini
# export OPENAI_API_KEY=... # for openai
# export LLM_PROVIDER=openai # for openai
export JINA_API_KEY=jina_... # free jina api key, get from https://jina.ai/reader
npm run dev $QUERY
```
### Official API
You can also use our official DeepSearch API, hosted and optimized by Jina AI:
```
https://deepsearch.jina.ai/v1/chat/completions
```
You can use it with any OpenAI-compatible client. For the authentication Bearer, get your Jina API key from https://jina.ai
#### Client integration guidelines
If you are building a web/local/mobile client that uses `Jina DeepSearch API`, here are some design guidelines:
- Our API is fully compatible with [OpenAI API schema](https://platform.openai.com/docs/api-reference/chat/create), this should greatly simplify the integration process. The model name is `jina-deepsearch-v1`.
- Our DeepSearch API is a reasoning+search grounding LLM, so it's best for questions that require deep reasoning and search.
- Two special tokens are introduced `...`, `...`. Please render them with care.
- Guide the user to get a Jina API key from https://jina.ai, with 1M free tokens for new API key.
- There are rate limits, [between 10RPM to 30RPM depending on the API key tier](https://jina.ai/contact-sales#rate-limit).
- [Download Jina AI logo here](https://jina.ai/logo-Jina-1024.zip)
## Demo
> was recorded with `gemini-1.5-flash`, the latest `gemini-2.0-flash` leads to much better results!
Query: `"what is the latest blog post's title from jina ai?"`
3 steps; answer is correct!

Query: `"what is the context length of readerlm-v2?"`
2 steps; answer is correct!

Query: `"list all employees from jina ai that u can find, as many as possible"`
11 steps; partially correct! but im not in the list :(

Query: `"who will be the biggest competitor of Jina AI"`
42 steps; future prediction kind, so it's arguably correct! atm Im not seeing `weaviate` as a competitor, but im open for the future "i told you so" moment.

More examples:
```
# example: no tool calling
npm run dev "1+1="
npm run dev "what is the capital of France?"
# example: 2-step
npm run dev "what is the latest news from Jina AI?"
# example: 3-step
npm run dev "what is the twitter account of jina ai's founder"
# example: 13-step, ambiguious question (no def of "big")
npm run dev "who is bigger? cohere, jina ai, voyage?"
# example: open question, research-like, long chain of thoughts
npm run dev "who will be president of US in 2028?"
npm run dev "what should be jina ai strategy for 2025?"
```
## Use Local LLM
> Note, not every LLM works with our reasoning flow, we need those who support structured output (sometimes called JSON Schema output, object output) well. Feel free to purpose a PR to add more open-source LLMs to the working list.
If you use Ollama or LMStudio, you can redirect the reasoning request to your local LLM by setting the following environment variables:
```bash
export LLM_PROVIDER=openai # yes, that's right - for local llm we still use openai client
export OPENAI_BASE_URL=http://127.0.0.1:1234/v1 # your local llm endpoint
export OPENAI_API_KEY=whatever # random string would do, as we don't use it (unless your local LLM has authentication)
export DEFAULT_MODEL_NAME=qwen2.5-7b # your local llm model name
```
## OpenAI-Compatible Server API
If you have a GUI client that supports OpenAI API (e.g. [CherryStudio](https://docs.cherry-ai.com/), [Chatbox](https://github.com/Bin-Huang/chatbox)) , you can simply config it to use this server.

Start the server:
```bash
# Without authentication
npm run serve
# With authentication (clients must provide this secret as Bearer token)
npm run serve --secret=your_secret_token
```
The server will start on http://localhost:3000 with the following endpoint:
### POST /v1/chat/completions
```bash
# Without authentication
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "jina-deepsearch-v1",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
# With authentication (when server is started with --secret)
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_secret_token" \
-d '{
"model": "jina-deepsearch-v1",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"stream": true
}'
```
Response format:
```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "jina-deepsearch-v1",
"system_fingerprint": "fp_44709d6fcb",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "YOUR FINAL ANSWER"
},
"logprobs": null,
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
```
For streaming responses (stream: true), the server sends chunks in this format:
```json
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1694268190,
"model": "jina-deepsearch-v1",
"system_fingerprint": "fp_44709d6fcb",
"choices": [{
"index": 0,
"delta": {
"content": "..."
},
"logprobs": null,
"finish_reason": null
}]
}
```
Note: The think content in streaming responses is wrapped in XML tags:
```
[thinking steps...]
[final answer]
```
## Docker Setup
### Build Docker Image
To build the Docker image for the application, run the following command:
```bash
docker build -t deepresearch:latest .
```
### Run Docker Container
To run the Docker container, use the following command:
```bash
docker run -p 3000:3000 --env GEMINI_API_KEY=your_gemini_api_key --env JINA_API_KEY=your_jina_api_key deepresearch:latest
```
### Docker Compose
You can also use Docker Compose to manage multi-container applications. To start the application with Docker Compose, run:
```bash
docker-compose up
```
## How Does it Work?
Not sure a flowchart helps, but here it is:
```mermaid
flowchart TD
Start([Start]) --> Init[Initialize context & variables]
Init --> CheckBudget{Token budget
exceeded?}
CheckBudget -->|No| GetQuestion[Get current question
from gaps]
CheckBudget -->|Yes| BeastMode[Enter Beast Mode]
GetQuestion --> GenPrompt[Generate prompt]
GenPrompt --> ModelGen[Generate response
using Gemini]
ModelGen --> ActionCheck{Check action
type}
ActionCheck -->|answer| AnswerCheck{Is original
question?}
AnswerCheck -->|Yes| EvalAnswer[Evaluate answer]
EvalAnswer --> IsGoodAnswer{Is answer
definitive?}
IsGoodAnswer -->|Yes| HasRefs{Has
references?}
HasRefs -->|Yes| End([End])
HasRefs -->|No| GetQuestion
IsGoodAnswer -->|No| StoreBad[Store bad attempt
Reset context]
StoreBad --> GetQuestion
AnswerCheck -->|No| StoreKnowledge[Store as intermediate
knowledge]
StoreKnowledge --> GetQuestion
ActionCheck -->|reflect| ProcessQuestions[Process new
sub-questions]
ProcessQuestions --> DedupQuestions{New unique
questions?}
DedupQuestions -->|Yes| AddGaps[Add to gaps queue]
DedupQuestions -->|No| DisableReflect[Disable reflect
for next step]
AddGaps --> GetQuestion
DisableReflect --> GetQuestion
ActionCheck -->|search| SearchQuery[Execute search]
SearchQuery --> NewURLs{New URLs
found?}
NewURLs -->|Yes| StoreURLs[Store URLs for
future visits]
NewURLs -->|No| DisableSearch[Disable search
for next step]
StoreURLs --> GetQuestion
DisableSearch --> GetQuestion
ActionCheck -->|visit| VisitURLs[Visit URLs]
VisitURLs --> NewContent{New content
found?}
NewContent -->|Yes| StoreContent[Store content as
knowledge]
NewContent -->|No| DisableVisit[Disable visit
for next step]
StoreContent --> GetQuestion
DisableVisit --> GetQuestion
BeastMode --> FinalAnswer[Generate final answer] --> End
```
## Evaluation
I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./src/evals/ego-questions.json) for evaluation. These are the questions about Jina AI that I know 100% the answer but LLMs do not.
I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer.
```bash
npm run eval ./src/evals/questions.json
```
Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set.
Plain `gemini-2.0-flash` can be run by setting `tokenBudget` to zero, skipping the while-loop and directly answering the question.
It should not be surprised that plain `gemini-2.0-flash` has a 0% pass rate, as I intentionally filtered out the questions that LLMs can answer.
| Metric | gemini-2.0-flash | #188f1bb |
|--------|------------------|----------|
| Pass Rate | 0% | 75% |
| Average Steps | 1 | 4 |
| Maximum Steps | 1 | 13 |
| Minimum Steps | 1 | 2 |
| Median Steps | 1 | 3 |
| Average Tokens | 428 | 68,574 |
| Median Tokens | 434 | 31,541 |
| Maximum Tokens | 463 | 363,655 |
| Minimum Tokens | 374 | 7,963 |