Update README.md
Browse files
README.md
CHANGED
@@ -1,141 +1,5 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
ensemble of LangChain's [Contextual compression](https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression/) and
|
7 |
-
[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) (Or alternatively, [SPLADE](https://github.com/naver/splade))
|
8 |
-
is used to extract the relevant parts (if any) of each web page in the search results
|
9 |
-
and the results are appended to the model's output.
|
10 |
-

|
11 |
-
|
12 |
-
|
13 |
-
* **[Table of Contents](#table-of-contents)**
|
14 |
-
* [Installation](#installation)
|
15 |
-
* [Usage](#usage)
|
16 |
-
+ [Using a custom regular expression](#using-a-custom-regular-expression)
|
17 |
-
+ [Reading web pages](#reading-web-pages)
|
18 |
-
* [Search backends](#search-backends)
|
19 |
-
+ [DuckDuckGo](#duckduckgo)
|
20 |
-
+ [SearXNG](#searxng)
|
21 |
-
+ [Search parameters](#search-parameters)
|
22 |
-
* [Keyword retrievers](#keyword-retrievers)
|
23 |
-
+ [Okapi BM25](#okapi-bm25)
|
24 |
-
+ [SPLADE](#splade)
|
25 |
-
* [Recommended models](#recommended-models)
|
26 |
-
|
27 |
-
## Installation
|
28 |
-
1. Go to the "Session" tab of the web UI and use "Install or update an extension"
|
29 |
-
to download the latest code for this extension.
|
30 |
-
2. To install the extension's depencies you have two options:
|
31 |
-
1. **The easy way:** Run the appropriate `update_wizard` script inside the text-generation-webui folder
|
32 |
-
and choose `Install/update extensions requirements`. This installs everything using `pip`,
|
33 |
-
which means using the unofficial `faiss-cpu` package. Therefore, it is not guaranteed to
|
34 |
-
work with your system (see [the official disclaimer](https://github.com/facebookresearch/faiss/wiki/Installing-Faiss#why-dont-you-support-installing-via-xxx-)).
|
35 |
-
2. **The safe way:** Manually update the conda environment in which you installed the dependencies of
|
36 |
-
[oobabooga's text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
37 |
-
Open the subfolder `text-generation-webui/extensions/LLM_Web_search` in a terminal or conda shell.
|
38 |
-
If you used the one-click install method, run the command
|
39 |
-
`conda env update -p <path_to_your_environment> --file environment.yml`,
|
40 |
-
where you need to replace `<path_to_your_environment>` with the path to the
|
41 |
-
`/installer_files/env` subfolder within the text-generation-webui folder.
|
42 |
-
Otherwise, if you made your own environment,
|
43 |
-
use `conda env update -n <name_of_your_environment> --file environment.yml`
|
44 |
-
(NB: Solving the environment can take a while)
|
45 |
-
3. Launch the Web UI with:
|
46 |
-
```python server.py --extension LLM_Web_search```
|
47 |
-
|
48 |
-
If the installation was successful and the extension was loaded, a new tab with the
|
49 |
-
title "LLM Web Search" should be visible in the web UI.
|
50 |
-
|
51 |
-
See https://github.com/oobabooga/text-generation-webui/wiki/07-%E2%80%90-Extensions for more
|
52 |
-
information about extensions.
|
53 |
-
|
54 |
-
## Usage
|
55 |
-
|
56 |
-
Search queries are extracted from the model's output using a regular expression. This is made easier by prompting the model
|
57 |
-
to use a fixed search command (see `system_prompts/` for example prompts).
|
58 |
-
Currently, only a single search query per model chat message is supported.
|
59 |
-
|
60 |
-
An example workflow of using this extension could be:
|
61 |
-
1. Load a model
|
62 |
-
2. Load a matching instruction template
|
63 |
-
3. Head over to the "LLM Web search" tab
|
64 |
-
4. Load a custom system message/prompt
|
65 |
-
5. Ensure that the query part of the command mentioned in the system message
|
66 |
-
can be matched using the current "Search command regex string"
|
67 |
-
(see "Using a custom regular expression" below)
|
68 |
-
6. Pick a hyperparameter generation preset that works well for you.
|
69 |
-
7. Choose "chat-instruct" or "instruct" mode and start chatting
|
70 |
-
|
71 |
-
### Using a custom regular expression
|
72 |
-
The default regular expression is:
|
73 |
-
```regexp
|
74 |
-
Search_web\("(.*)"\)
|
75 |
-
```
|
76 |
-
Where `Search_web` is the search command and everything between the quotation marks
|
77 |
-
inside the parentheses will be used as the search query. Every custom regular expression must use a
|
78 |
-
[capture group](https://www.regular-expressions.info/brackets.html) to extract the search
|
79 |
-
query. I recommend https://www.debuggex.com/ to try out custom regular expressions. If a regex
|
80 |
-
fulfills the requirement above, the search query should be matched by "Group 1" in Debuggex.
|
81 |
-
|
82 |
-
Here is an example of a more flexible, but more complex, regex that works for several
|
83 |
-
different models:
|
84 |
-
```regexp
|
85 |
-
[Ss]earch_web\((?:["'])(.*)(?:["'])\)
|
86 |
-
```
|
87 |
-
### Reading web pages
|
88 |
-
Experimental support exists for extracting the full text content from a webpage. The default regex to use this
|
89 |
-
functionality is:
|
90 |
-
```regexp
|
91 |
-
Open_url\("(.*)"\)
|
92 |
-
```
|
93 |
-
**Note**: The full content of a web page is likely to exceed the maximum context length of your average local LLM.
|
94 |
-
## Search backends
|
95 |
-
|
96 |
-
### DuckDuckGo
|
97 |
-
This is the default web search backend.
|
98 |
-
|
99 |
-
### SearXNG
|
100 |
-
|
101 |
-
Rudimentary support exists for SearXNG. To use a local or remote
|
102 |
-
SearXNG instance instead of DuckDuckGo, simply paste the URL into the
|
103 |
-
"SearXNG URL" text field of the "LLM Web Search" settings tab. The instance must support
|
104 |
-
returning results in JSON format.
|
105 |
-
|
106 |
-
#### Search parameters
|
107 |
-
To modify the categories, engines, languages etc. that should be used for a
|
108 |
-
specific query, it must follow the
|
109 |
-
[SearXNG search syntax](https://docs.searxng.org/user/search-syntax.html). Currently,
|
110 |
-
automatic redirect and Special Queries are not supported.
|
111 |
-
|
112 |
-
|
113 |
-
## Keyword retrievers
|
114 |
-
### Okapi BM25
|
115 |
-
This extension comes out of the box with
|
116 |
-
[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) enabled, which is widely used and very popuplar
|
117 |
-
for keyword based document retrieval. It runs on the CPU and,
|
118 |
-
for the purpose of this extension, it is fast.
|
119 |
-
### SPLADE
|
120 |
-
If you don't run the extension in "CPU only" mode and have some VRAM to spare,
|
121 |
-
you can also select [SPLADE](https://github.com/naver/splade) in the "Advanced settings" section
|
122 |
-
as an alternative. It has been [shown](https://arxiv.org/pdf/2207.03834.pdf) to outperform BM25 in multiple benchmarks
|
123 |
-
and uses a technique called "query expansion" to add additional contextually relevant words
|
124 |
-
to the original query. However, it is slower than BM25. You can read more about it [here](https://www.pinecone.io/learn/splade/).
|
125 |
-
To use SPLADE, you have to install the additional dependency [qdrant-client](https://github.com/qdrant/qdrant-client).
|
126 |
-
Simply activate the conda environment of the main web UI and run
|
127 |
-
`pip install qdrant-client`.
|
128 |
-
To improve performance, documents are embedded in batches and in parallel. Increasing the
|
129 |
-
"SPLADE batch size" parameter setting improves performance up to a certain point,
|
130 |
-
but VRAM usage ramps up quickly with increasing batch size. A batch size of 8 appears
|
131 |
-
to be a good trade-off, but the default value is 2 to avoid running out of memory on smaller
|
132 |
-
GPUs.
|
133 |
-
|
134 |
-
## Recommended models
|
135 |
-
If you (like me) have ≤ 12 GB VRAM, I recommend using
|
136 |
-
[Llama-3-8B-instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
137 |
-
You can find a matching instruction template in the extension's `instruction_templates`
|
138 |
-
folder. Simply copy it to the main web UI's `instruction-templates` folder.
|
139 |
-
**Note:** Several existing GGUF versions have a stop token issue, which can be solved by [editing the file's
|
140 |
-
metadata](https://www.reddit.com/r/LocalLLaMA/comments/1c7dkxh/tutorial_how_to_make_llama3instruct_ggufs_less/). A GGUF version where this issue has already been fixed can be found
|
141 |
-
[here](https://huggingface.co/AI-Engine/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct.Q5_k_m_with_temp_stop_token_fix.gguf).
|
|
|
1 |
+
---
|
2 |
+
title: LLMsearch
|
3 |
+
emoji: 💬
|
4 |
+
colorFrom: yellow
|
5 |
+
colorTo: purple
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|