File size: 17,158 Bytes
4bdb245
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
It is important that you review the Main Concepts before you start the installation process.

## Base requirements to run PrivateGPT

* Clone PrivateGPT repository, and navigate to it:

```bash
  git clone https://github.com/zylon-ai/private-gpt
  cd private-gpt
```

* Install Python `3.11` (*if you do not have it already*). Ideally through a python version manager like `pyenv`.
  Earlier python versions are not supported.
    * osx/linux: [pyenv](https://github.com/pyenv/pyenv)
    * windows: [pyenv-win](https://github.com/pyenv-win/pyenv-win)

```bash
pyenv install 3.11
pyenv local 3.11
```

* Install [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) for dependency management:

* Install `make` to be able to run the different scripts:
    * osx: (Using homebrew): `brew install make`
    * windows: (Using chocolatey) `choco install make`

## Install and run your desired setup

PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
Here are the different options available:

- LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike", "azopenai"
- Embeddings: "huggingface", "openai", "sagemaker", "azopenai"
- Vector stores: "qdrant", "chroma", "postgres"
- UI: whether or not to enable UI (Gradio) or just go with the API

In order to only install the required dependencies, PrivateGPT offers different `extras` that can be combined during the installation process:

```bash
poetry install --extras "<extra1> <extra2>..."
```

Where `<extra>` can be any of the following:

- ui: adds support for UI using Gradio
- llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running, requires Ollama running locally
- llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
- llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- llms-openai: adds support for OpenAI LLM, requires OpenAI API key
- llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
- llms-azopenai: adds support for Azure OpenAI LLM, requires Azure OpenAI inference endpoints
- embeddings-ollama: adds support for Ollama Embeddings, requires Ollama running locally
- embeddings-huggingface: adds support for local Embeddings using HuggingFace
- embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
- embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
- embeddings-azopenai = adds support for Azure OpenAI Embeddings, requires Azure OpenAI inference endpoints
- vector-stores-qdrant: adds support for Qdrant vector store
- vector-stores-chroma: adds support for Chroma DB vector store
- vector-stores-postgres: adds support for Postgres vector store

## Recommended Setups

There are just some examples of recommended setups. You can mix and match the different options to fit your needs.
You'll find more information in the Manual section of the documentation.

> **Important for Windows**: In the examples below or how to run PrivateGPT with `make run`, `PGPT_PROFILES` env var is being set inline following Unix command line syntax (works on MacOS and Linux).
If you are using Windows, you'll need to set the env var in a different way, for example:

```powershell
# Powershell
$env:PGPT_PROFILES="ollama"
make run
```

or

```cmd
# CMD
set PGPT_PROFILES=ollama
make run
```

### Local, Ollama-powered setup - RECOMMENDED

**The easiest way to run PrivateGPT fully locally** is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It's the recommended setup for local development.

Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Ollama on your machine.

After the installation, make sure the Ollama desktop app is closed.

Install the models to be used, the default settings-ollama.yaml is configured to user `mistral 7b` LLM (~4GB) and `nomic-embed-text` Embeddings (~275MB). Therefore:

```bash
ollama pull mistral
ollama pull nomic-embed-text
```

Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
```bash
ollama serve
```

Once done, on a different terminal, you can install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
```

Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.

```bash
PGPT_PROFILES=ollama make run
```

PrivateGPT will use the already existing `settings-ollama.yaml` settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)

The UI will be available at http://localhost:8001

### Private, Sagemaker-powered setup

If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.

You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.

Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoints.

Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
```

Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.

```bash
PGPT_PROFILES=sagemaker make run
```

PrivateGPT will use the already existing `settings-sagemaker.yaml` settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

### Non-Private, OpenAI-powered test setup

If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:

You need an OPENAI API key to run this setup.

Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.

Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
```

Once installed, you can run PrivateGPT.

```bash
PGPT_PROFILES=openai make run
```

PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

### Non-Private, Azure OpenAI-powered test setup

If you want to test PrivateGPT with Azure OpenAI's LLM and Embeddings -taking into account your data is going to Azure OpenAI!- you can run the following command:

You need to have access to Azure OpenAI inference endpoints for the LLM and / or the embeddings, and have Azure OpenAI credentials properly configured.

Edit the `settings-azopenai.yaml` file to include the correct Azure OpenAI endpoints.

Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-azopenai embeddings-azopenai vector-stores-qdrant"
```

Once installed, you can run PrivateGPT.

```bash
PGPT_PROFILES=azopenai make run
```

PrivateGPT will use the already existing `settings-azopenai.yaml` settings file, which is already configured to use Azure OpenAI LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

### Local, Llama-CPP powered setup

If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:

```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```

In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```

Once installed, you can run PrivateGPT with the following command:

```bash
PGPT_PROFILES=local make run
```

PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.

The UI will be available at http://localhost:8001

#### Llama-CPP support

For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in
particular [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
is used.

You'll need to have a valid C++ compiler like gcc installed. See [Troubleshooting: C++ Compiler](#troubleshooting-c-compiler) for more details.

> It's highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform.
> Running into installation issues is very likely, and you'll need to troubleshoot them yourself.

##### Llama-CPP OSX GPU support

You will need to build [llama.cpp](https://github.com/ggerganov/llama.cpp) with metal support.

To do that, you need to install `llama.cpp` python's binding `llama-cpp-python` through pip, with the compilation flag
that activate `METAL`: you have to pass `-DLLAMA_METAL=on` to the CMake command tha `pip` runs for you (see below).

In other words, one should simply run:
```bash
CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python
```

The above command will force the re-installation of `llama-cpp-python` with `METAL` support by compiling
`llama.cpp` locally with your `METAL` libraries (shipped by default with your macOS).

More information is available in the documentation of the libraries themselves:
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration)
* [llama-cpp-python's documentation](https://llama-cpp-python.readthedocs.io/en/latest/#installation-with-hardware-acceleration)
* [llama.cpp](https://github.com/ggerganov/llama.cpp#build)

##### Llama-CPP Windows NVIDIA GPU support

Windows GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
dependencies.

Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):

* Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
  date and your GPU is detected.
* [Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/

If you have all required dependencies properly configured running the
following powershell command should succeed.

```powershell
$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
```

If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.

```console
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```

Note that llama.cpp offloads matrix calculations to the GPU but the performance is
still hit heavily due to latency between CPU and GPU communication. You might need to tweak
batch sizes and other parameters to get the best performance for your particular system.

##### Llama-CPP Linux NVIDIA GPU support and Windows-WSL

Linux GPU support is done through CUDA.
Follow the instructions on the original [llama.cpp](https://github.com/ggerganov/llama.cpp) repo to install the required
external
dependencies.

Some tips:

* Make sure you have an up-to-date C++ compiler
* Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
* Verify your installation is correct by running `nvcc --version` and `nvidia-smi`, ensure your CUDA version is up to
  date and your GPU is detected.

After that running the following command in the repository will install llama.cpp with GPU support:

```bash
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
```

If your installation was correct, you should see a message similar to the following next
time you start the server `BLAS = 1`.

```
llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
```

##### Llama-CPP Linux AMD GPU support

Linux GPU support is done through ROCm.
Some tips:
* Install ROCm from [quick-start install guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html)
* [Install PyTorch for ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/install-pytorch.html)
```bash
wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/torch-2.1.1%2Brocm6.0-cp311-cp311-linux_x86_64.whl
poetry run pip install --force-reinstall --no-cache-dir torch-2.1.1+rocm6.0-cp311-cp311-linux_x86_64.whl
```
* Install bitsandbytes for ROCm
```bash
PYTORCH_ROCM_ARCH=gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx940,gfx941,gfx942
BITSANDBYTES_VERSION=62353b0200b8557026c176e74ac48b84b953a854
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
cd bitsandbytes-rocm-5.6
git checkout ${BITSANDBYTES_VERSION}
make hip ROCM_TARGET=${PYTORCH_ROCM_ARCH} ROCM_HOME=/opt/rocm/
pip install . --extra-index-url https://download.pytorch.org/whl/nightly
```

After that running the following command in the repository will install llama.cpp with GPU support:
```bash
LLAMA_CPP_PYTHON_VERSION=0.2.56
DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942
CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DAMDGPU_TARGETS=${DAMDGPU_TARGETS}" poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==${LLAMA_CPP_PYTHON_VERSION}
```

If your installation was correct, you should see a message similar to the following next time you start the server `BLAS = 1`.

```
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
```

##### Llama-CPP Known issues and Troubleshooting

Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms.
You might encounter several issues:

* Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
* GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on
  the host.
* Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms.
  Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.).
  If you encounter any of these issues, please open an issue and we'll try to help.

One of the first reflex to adopt is: get more information.
If, during your installation, something does not go as planned, retry in *verbose* mode, and see what goes wrong.

For example, when installing packages with `pip install`, you can add the option `-vvv` to show the details of the installation.

##### Llama-CPP Troubleshooting: C++ Compiler

If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++
compiler on your computer.

**For Windows 10/11**

To install a C++ compiler on Windows 10/11, follow these steps:

1. Install Visual Studio 2022.
2. Make sure the following components are selected:
    * Universal Windows Platform development
    * C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.

**For OSX**

1. Check if you have a C++ compiler installed, `Xcode` should have done it for you. To install Xcode, go to the App
   Store and search for Xcode and install it. **Or** you can install the command line tools by running `xcode-select --install`.
2. If not, you can install clang or gcc with homebrew `brew install gcc`

##### Llama-CPP Troubleshooting: Mac Running Intel

When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '
-march=native'_ during pip install.

If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_