hierholzer
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,33 +1,64 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
language:
|
4 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
6 |
|
7 |
[![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
|
8 |
|
9 |
-
# Model
|
|
|
10 |
|
11 |
|
12 |
-
Here
|
13 |
|
14 |
-
GGUF is designed for use with GGML and other executors.<br>
|
15 |
-
GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.<br>
|
16 |
-
Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.<br>
|
17 |
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
## Uploaded Quantization Types<br>
|
20 |
|
21 |
-
|
|
|
|
|
22 |
|
23 |
- [x] Q4_K_M ~ *Recommended*
|
24 |
- [x] Q5_K_M ~ *Recommended*
|
25 |
- [x] Q8_0 ~ *NOT Recommended*
|
26 |
- [ ]
|
27 |
|
28 |
-
|
|
|
29 |
|
30 |
-
|
|
|
31 |
|
32 |
| **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
|
33 |
|-------|:------:|:------:|:-----:|----------------------------------------------------------------|
|
@@ -51,9 +82,102 @@ Here are all of the Quantization Types that are Possible. Let me know if you nee
|
|
51 |
| 1 | or | F16 | : | extremely large, virtually no quality loss - *NOT Recommended* |
|
52 |
| 0 | or | F32 | : | absolutely huge, lossless - *NOT Recommended* |
|
53 |
|
54 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
|
57 |
|
58 |
[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
|
59 |
[![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)
|
|
|
1 |
---
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: mit
|
5 |
+
tags:
|
6 |
+
- meta
|
7 |
+
- pytorch
|
8 |
+
- llama-3.1
|
9 |
+
- llama-3.1-instruct
|
10 |
+
- gguf
|
11 |
+
model_name: Llama-3.1-70B-Instruct-GGUF
|
12 |
+
arxiv: 2407.21783
|
13 |
+
base_model: meta-llama/Llama-3.1-70b-instruct.hf
|
14 |
+
inference: false
|
15 |
+
model_creator: Meta Llama 3.1
|
16 |
+
model_type: llama
|
17 |
+
pipeline_tag: text-generation
|
18 |
+
prompt_template: >
|
19 |
+
[INST] <<SYS>>
|
20 |
+
|
21 |
+
You are a helpful, respectful and honest assistant. Always answer as helpfully
|
22 |
+
as possible.If a question does not make any sense, or is not factually
|
23 |
+
coherent, explain why instead of answering something that is not correct. If
|
24 |
+
you don't know the answer to a question, do not answer it with false
|
25 |
+
information.
|
26 |
+
|
27 |
+
<</SYS>>
|
28 |
+
|
29 |
+
{prompt}[/INST]
|
30 |
+
quantized_by: hierholzer
|
31 |
---
|
32 |
|
33 |
[![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
|
34 |
|
35 |
+
# GGUF Model
|
36 |
+
-----------------------------------
|
37 |
|
38 |
|
39 |
+
Here are Quantized versions of Llama-3.1-70B-Instruct using GGUF
|
40 |
|
|
|
|
|
|
|
41 |
|
42 |
+
## 🤔 What Is GGUF
|
43 |
+
GGUF is designed for use with GGML and other executors.
|
44 |
+
GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
|
45 |
+
Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
|
46 |
|
|
|
47 |
|
48 |
+
## ☑️Uploaded Quantization Types
|
49 |
+
|
50 |
+
Here are the quantized versions available:
|
51 |
|
52 |
- [x] Q4_K_M ~ *Recommended*
|
53 |
- [x] Q5_K_M ~ *Recommended*
|
54 |
- [x] Q8_0 ~ *NOT Recommended*
|
55 |
- [ ]
|
56 |
|
57 |
+
Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.
|
58 |
+
|
59 |
|
60 |
+
### 📈All Quantization Types Possible
|
61 |
+
Below is a table of all the Quantication Types that are possible.
|
62 |
|
63 |
| **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
|
64 |
|-------|:------:|:------:|:-----:|----------------------------------------------------------------|
|
|
|
82 |
| 1 | or | F16 | : | extremely large, virtually no quality loss - *NOT Recommended* |
|
83 |
| 0 | or | F32 | : | absolutely huge, lossless - *NOT Recommended* |
|
84 |
|
85 |
+
## 💪 Benefits of using GGUF
|
86 |
+
|
87 |
+
By using a GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
|
88 |
+
This also allows you to run this 70B Model on a machine with less memory than a non quantized version.
|
89 |
+
|
90 |
+
|
91 |
+
## ⚙️️Installation
|
92 |
+
--------------------------------------------
|
93 |
+
Here are 2 different methods you can use to run the quantized versions of Llama-3.1-70B-Instruct
|
94 |
+
|
95 |
+
### 1️⃣ Text-generation-webui
|
96 |
+
|
97 |
+
Text-generation-webui is a web UI for Large Language Models that you can run locally.
|
98 |
+
|
99 |
+
#### ☑️ How to install Text-generation-webui
|
100 |
+
*If you already have Text-generation-webui then skip this section*
|
101 |
+
|
102 |
+
| # | Download Text-generation-webui |
|
103 |
+
|----|------------------------------------------------------------------------------------------------------------------|
|
104 |
+
| 1. | Clone the text-generation-webui repository from Github by copying the git clone snippet below: |
|
105 |
+
```shell
|
106 |
+
git clone https://github.com/oobabooga/text-generation-webui.git
|
107 |
+
```
|
108 |
+
| # | Install Text-generation-webui |
|
109 |
+
|----|------------------------------------------------------------------------------------------------------------------|
|
110 |
+
| 1. | Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. |
|
111 |
+
| 2. | Select your GPU vendor when asked. |
|
112 |
+
| 3. | Once the installation script ends, browse to `http://localhost:7860`. |
|
113 |
+
|
114 |
+
#### ✅Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui
|
115 |
+
| # | Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui |
|
116 |
+
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
117 |
+
| 1. | Once you are running text-generation-webui in your browser, click on the 'Model' Tab at the top of your window. |
|
118 |
+
| 2. | In the Download Model section, you need to enter the model repo: *hierholzer/Llama-3.1-70B-Instruct-GGUF* and below it, the specific filename to download, such as: *Llama-3.1-70B-Instruct-Q4_K_M.gguf* |
|
119 |
+
| 3. | Click Download and wait for the download to complete. NOTE: you can see the download progress back in your terminal window. |
|
120 |
+
| 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in. |
|
121 |
+
| 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system. |
|
122 |
+
|
123 |
+
### 2️⃣ Ollama
|
124 |
+
Ollama runs as a local service.
|
125 |
+
Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
|
126 |
+
Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
|
127 |
+
*Feel free to reach out to me if you would like to know some examples that I use this API for*
|
128 |
+
|
129 |
+
#### ☑️ How to install Ollama
|
130 |
+
Go To the URL below, and then select which OS you are using
|
131 |
+
```shell
|
132 |
+
https://ollama.com/download
|
133 |
+
```
|
134 |
+
Using Windows, or Mac you will then download a file and run it.
|
135 |
+
If you are using linux it will just provide a single command that you need to run in your terminal window.
|
136 |
+
*Thats about it for installing Ollama*
|
137 |
+
#### ✅Using Llama-3.1-70B-Instruct-GGUF with Ollama
|
138 |
+
Ollama does have a Model Library where you can download models:
|
139 |
+
```shell
|
140 |
+
https://ollama.com/library
|
141 |
+
```
|
142 |
+
This Model Library offers all sizes of regular Lama 3.1, as well as the 8B version of Llama 3.1-Instruct.
|
143 |
+
However, if you would like to use the 70B quantized version of Llama 3.1-Instruct
|
144 |
+
then you will have to use the following instructions.
|
145 |
+
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama |
|
146 |
+
|----|----------------------------------------------------------------------------------------------|
|
147 |
+
| 1. | Download your desired version of in the Files and Versions section of this Model Repository |
|
148 |
+
| 2. | Next, create a Modelfile configuration that defines the model's behavior. For Example: |
|
149 |
+
```shell
|
150 |
+
# Modelfile
|
151 |
+
FROM "./Llama-3.1-70B-Instruct-Q4_K_M.gguf"
|
152 |
+
PARAMETER stop "<|im_start|>"
|
153 |
+
PARAMETER stop "<|im_end|>"
|
154 |
+
TEMPLATE """
|
155 |
+
<|im_start|>system
|
156 |
+
<|im_end|>
|
157 |
+
<|im_start|>user
|
158 |
+
<|im_end|>
|
159 |
+
<|im_start|>assistant
|
160 |
+
"""
|
161 |
+
```
|
162 |
+
*Replace ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
|
163 |
+
The TEMPLATE line defines the prompt format using system, user, and assistant roles.
|
164 |
+
You can customize this based on your use case.*
|
165 |
+
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
|
166 |
+
|----|-----------------------------------------------------------------------------------|
|
167 |
+
| 3. | Now, build the Ollama model using the ollama create command: |
|
168 |
+
```shell
|
169 |
+
ollama create "Llama-3.1-70B-Instruct-Q4_K_M" -f ./Llama-3.1-70B-Instruct-Q4_K_M.gguf
|
170 |
+
```
|
171 |
+
*Once again Replace the name: Llama-3.1-70B-Instruct-Q4_K_M and the
|
172 |
+
model: ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
|
173 |
+
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
|
174 |
+
|----|-----------------------------------------------------------------------------------|
|
175 |
+
| 4. | You then can run your model using the ollama run command: |
|
176 |
+
```shell
|
177 |
+
ollama run Llama-3.1-70B-Instruct-Q4_K_M
|
178 |
+
```
|
179 |
|
180 |
+
-------------------------------------------------
|
181 |
|
182 |
[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
|
183 |
[![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)
|