hierholzer commited on
Commit
3024ff2
·
verified ·
1 Parent(s): 0dab441

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -0
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - meta
7
+ - pytorch
8
+ - llama-3.3
9
+ - llama-3.3-instruct
10
+ - gguf
11
+ model_name: Llama-3.3-70B-Instruct-GGUF
12
+ arxiv: 2407.21783
13
+ base_model: meta-llama/Llama-3.3-70b-instruct.hf
14
+ inference: false
15
+ model_creator: Meta Llama 3.3
16
+ model_type: llama
17
+ pipeline_tag: text-generation
18
+ prompt_template: >
19
+ [INST] <<SYS>>
20
+
21
+ You are a helpful, respectful and honest assistant. Always answer as helpfully
22
+ as possible.If a question does not make any sense, or is not factually
23
+ coherent, explain why instead of answering something that is not correct. If
24
+ you don't know the answer to a question, do not answer it with false
25
+ information.
26
+
27
+ <</SYS>>
28
+
29
+ {prompt}[/INST]
30
+ quantized_by: hierholzer
31
+ ---
32
+
33
+ [![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
34
+
35
+ # GGUF Model
36
+ -----------------------------------
37
+
38
+
39
+ Here are Quantized versions of Llama-3.3-70B-Instruct using GGUF
40
+
41
+
42
+ ## 🤔 What Is GGUF
43
+ GGUF is designed for use with GGML and other executors.
44
+ GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
45
+ Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
46
+
47
+
48
+ ## ☑️Uploaded Quantization Types
49
+
50
+ Here are the quantized versions that I have available:
51
+
52
+ - [ ] Q2_K
53
+ - [ ] Q3_K_S
54
+ - [ ] Q3_K_M
55
+ - [ ] Q3_K_L
56
+ - [x] Q4_K_S
57
+ - [x] Q4_K_M ~ *Recommended*
58
+ - [x] Q5_K_S ~ *Recommended*
59
+ - [x] Q5_K_M ~ *Recommended*
60
+ - [ ] Q6_K
61
+ - [ ] Q8_0 ~ *NOT Recommended*
62
+ - [ ] F16 ~ *NOT Recommended*
63
+ - [ ] F32 ~ *NOT Recommended*
64
+
65
+ Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.
66
+
67
+
68
+ ### 📈All Quantization Types Possible
69
+ Below is a table of all the Quantication Types that are possible as well as short descriptions.
70
+
71
+ | **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
72
+ |-------|:------:|:------:|:-----:|----------------------------------------------------------------|
73
+ | 2 | or | Q4_0 | : | small, very high quality loss - legacy, prefer using Q3_K_M |
74
+ | 3 | or | Q4_1 | : | small, substantial quality loss - legacy, prefer using Q3_K_L |
75
+ | 8 | or | Q5_0 | : | medium, balanced quality - legacy, prefer using Q4_K_M |
76
+ | 9 | or | Q5_1 | : | medium, low quality loss - legacy, prefer using Q5_K_M |
77
+ | 10 | or | Q2_K | : | smallest, extreme quality loss - *NOT Recommended* |
78
+ | 12 | or | Q3_K | : | alias for Q3_K_M |
79
+ | 11 | or | Q3_K_S | : | very small, very high quality loss |
80
+ | 12 | or | Q3_K_M | : | very small, high quality loss |
81
+ | 13 | or | Q3_K_L | : | small, high quality loss |
82
+ | 15 | or | Q4_K | : | alias for Q4_K_M |
83
+ | 14 | or | Q4_K_S | : | small, some quality loss |
84
+ | 15 | or | Q4_K_M | : | medium, balanced quality - *Recommended* |
85
+ | 17 | or | Q5_K | : | alias for Q5_K_M |
86
+ | 16 | or | Q5_K_S | : | large, low quality loss - *Recommended* |
87
+ | 17 | or | Q5_K_M | : | large, very low quality loss - *Recommended* |
88
+ | 18 | or | Q6_K | : | very large, very low quality loss |
89
+ | 7 | or | Q8_0 | : | very large, extremely low quality loss |
90
+ | 1 | or | F16 | : | extremely large, virtually no quality loss - *NOT Recommended* |
91
+ | 0 | or | F32 | : | absolutely huge, lossless - *NOT Recommended* |
92
+
93
+ ## 💪 Benefits of using GGUF
94
+
95
+ By using a GGUF version of Llama-3.3-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
96
+ This also allows you to run this 70B Model on a machine with less memory than a non quantized version.
97
+
98
+
99
+ ## ⚙️️Installation
100
+ --------------------------------------------
101
+ Here are 2 different methods you can use to run the quantized versions of Llama-3.1-70B-Instruct
102
+
103
+ ### 1️⃣ Text-generation-webui
104
+
105
+ Text-generation-webui is a web UI for Large Language Models that you can run locally.
106
+
107
+ #### ☑️ How to install Text-generation-webui
108
+ *If you already have Text-generation-webui then skip this section*
109
+
110
+ | # | Download Text-generation-webui |
111
+ |----|------------------------------------------------------------------------------------------------------------------|
112
+ | 1. | Clone the text-generation-webui repository from Github by copying the git clone snippet below: |
113
+ ```shell
114
+ git clone https://github.com/oobabooga/text-generation-webui.git
115
+ ```
116
+ | # | Install Text-generation-webui |
117
+ |----|------------------------------------------------------------------------------------------------------------------|
118
+ | 1. | Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. |
119
+ | 2. | Select your GPU vendor when asked. |
120
+ | 3. | Once the installation script ends, browse to `http://localhost:7860`. |
121
+
122
+ #### ✅Using Llama-3.3-70B-Instruct-GGUF with Text-generation-webui
123
+ | # | Using Llama-3.3-70B-Instruct-GGUF with Text-generation-webui |
124
+ |----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
125
+ | 1. | Once you are running text-generation-webui in your browser, click on the 'Model' Tab at the top of your window. |
126
+ | 2. | In the Download Model section, you need to enter the model repo: *hierholzer/Llama-3.3-70B-Instruct-GGUF* and below it, the specific filename to download, such as: *Llama-3.1-70B-Instruct-Q4_K_M.gguf* |
127
+ | 3. | Click Download and wait for the download to complete. NOTE: you can see the download progress back in your terminal window. |
128
+ | 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in. |
129
+ | 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system. |
130
+
131
+ ### 2️⃣ Ollama
132
+ Ollama runs as a local service.
133
+ Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
134
+ Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
135
+ *Feel free to reach out to me if you would like to know some examples that I use this API for*
136
+
137
+ #### ☑️ How to install Ollama
138
+ Go To the URL below, and then select which OS you are using
139
+ ```shell
140
+ https://ollama.com/download
141
+ ```
142
+ Using Windows, or Mac you will then download a file and run it.
143
+ If you are using linux it will just provide a single command that you need to run in your terminal window.
144
+ *Thats about it for installing Ollama*
145
+ #### ✅Using Llama-3.3-70B-Instruct-GGUF with Ollama
146
+ Ollama does have a Model Library where you can download models:
147
+ ```shell
148
+ https://ollama.com/library
149
+ ```
150
+ This Model Library offers all sizes of regular Lama 3.3, as well as the 8B version of Llama 3.3-Instruct.
151
+ However, if you would like to use the 70B quantized version of Llama 3.3-Instruct
152
+ then you will have to use the following instructions.
153
+ | # | Running the 70B quantized version of Llama 3.3-Instruct with Ollama |
154
+ |----|----------------------------------------------------------------------------------------------|
155
+ | 1. | Download your desired version of in the Files and Versions section of this Model Repository |
156
+ | 2. | Next, create a Modelfile configuration that defines the model's behavior. For Example: |
157
+ ```shell
158
+ # Modelfile
159
+ FROM "./Llama-3.3-70B-Instruct-Q4_K_M.gguf"
160
+ PARAMETER stop "<|im_start|>"
161
+ PARAMETER stop "<|im_end|>"
162
+ TEMPLATE """
163
+ <|im_start|>system
164
+ <|im_end|>
165
+ <|im_start|>user
166
+ <|im_end|>
167
+ <|im_start|>assistant
168
+ """
169
+ ```
170
+ *Replace ./Llama-3.3-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
171
+ The TEMPLATE line defines the prompt format using system, user, and assistant roles.
172
+ You can customize this based on your use case.*
173
+ | # | Running the 70B quantized version of Llama 3.3-Instruct with Ollama - *continued* |
174
+ |----|-----------------------------------------------------------------------------------|
175
+ | 3. | Now, build the Ollama model using the ollama create command: |
176
+ ```shell
177
+ ollama create "Llama-3.3-70B-Instruct-Q4_K_M" -f ./Llama-3.3-70B-Instruct-Q4_K_M.gguf
178
+ ```
179
+ *Once again Replace the name: Llama-3.3-70B-Instruct-Q4_K_M and the
180
+ model: ./Llama-3.3-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
181
+ | # | Running the 70B quantized version of Llama 3.3-Instruct with Ollama - *continued* |
182
+ |----|-----------------------------------------------------------------------------------|
183
+ | 4. | You then can run your model using the ollama run command: |
184
+ ```shell
185
+ ollama run Llama-3.3-70B-Instruct-Q4_K_M
186
+ ```
187
+
188
+ -------------------------------------------------
189
+
190
+ [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
191
+ [![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)
192
+ [![CPU](https://img.shields.io/badge/CPU-x86%2C%20x64%2C%20ARM%2C%20ARM64-FF8C00)](https://docs.abblix.com/docs/technical-requirements)
193
+ [![forthebadge](https://forthebadge.com/images/badges/license-mit.svg)](https://forthebadge.com)
194
+ [![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com)
195
+ [![forthebadge](https://forthebadge.com/images/badges/powered-by-electricity.svg)](https://forthebadge.com)