hierholzer commited on
Commit
cfa0cb6
·
verified ·
1 Parent(s): ccbe5ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -12
README.md CHANGED
@@ -1,33 +1,64 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  [![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
8
 
9
- # Model
 
10
 
11
 
12
- Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF<br>
13
 
14
- GGUF is designed for use with GGML and other executors.<br>
15
- GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.<br>
16
- Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.<br>
17
 
 
 
 
 
18
 
19
- ## Uploaded Quantization Types<br>
20
 
21
- Currently, I have uploaded 2 quantized versions:
 
 
22
 
23
  - [x] Q4_K_M ~ *Recommended*
24
  - [x] Q5_K_M ~ *Recommended*
25
  - [x] Q8_0 ~ *NOT Recommended*
26
  - [ ]
27
 
28
- ### All Quantization Types Possible
 
29
 
30
- Here are all of the Quantization Types that are Possible. Let me know if you need any other versions
 
31
 
32
  | **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
33
  |-------|:------:|:------:|:-----:|----------------------------------------------------------------|
@@ -51,9 +82,102 @@ Here are all of the Quantization Types that are Possible. Let me know if you nee
51
  | 1 | or | F16 | : | extremely large, virtually no quality loss - *NOT Recommended* |
52
  | 0 | or | F32 | : | absolutely huge, lossless - *NOT Recommended* |
53
 
54
- ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
57
 
58
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
59
  [![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ tags:
6
+ - meta
7
+ - pytorch
8
+ - llama-3.1
9
+ - llama-3.1-instruct
10
+ - gguf
11
+ model_name: Llama-3.1-70B-Instruct-GGUF
12
+ arxiv: 2407.21783
13
+ base_model: meta-llama/Llama-3.1-70b-instruct.hf
14
+ inference: false
15
+ model_creator: Meta Llama 3.1
16
+ model_type: llama
17
+ pipeline_tag: text-generation
18
+ prompt_template: >
19
+ [INST] <<SYS>>
20
+
21
+ You are a helpful, respectful and honest assistant. Always answer as helpfully
22
+ as possible.If a question does not make any sense, or is not factually
23
+ coherent, explain why instead of answering something that is not correct. If
24
+ you don't know the answer to a question, do not answer it with false
25
+ information.
26
+
27
+ <</SYS>>
28
+
29
+ {prompt}[/INST]
30
+ quantized_by: hierholzer
31
  ---
32
 
33
  [![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
34
 
35
+ # GGUF Model
36
+ -----------------------------------
37
 
38
 
39
+ Here are Quantized versions of Llama-3.1-70B-Instruct using GGUF
40
 
 
 
 
41
 
42
+ ## 🤔 What Is GGUF
43
+ GGUF is designed for use with GGML and other executors.
44
+ GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
45
+ Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
46
 
 
47
 
48
+ ## ☑️Uploaded Quantization Types
49
+
50
+ Here are the quantized versions available:
51
 
52
  - [x] Q4_K_M ~ *Recommended*
53
  - [x] Q5_K_M ~ *Recommended*
54
  - [x] Q8_0 ~ *NOT Recommended*
55
  - [ ]
56
 
57
+ Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.
58
+
59
 
60
+ ### 📈All Quantization Types Possible
61
+ Below is a table of all the Quantication Types that are possible.
62
 
63
  | **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
64
  |-------|:------:|:------:|:-----:|----------------------------------------------------------------|
 
82
  | 1 | or | F16 | : | extremely large, virtually no quality loss - *NOT Recommended* |
83
  | 0 | or | F32 | : | absolutely huge, lossless - *NOT Recommended* |
84
 
85
+ ## 💪 Benefits of using GGUF
86
+
87
+ By using a GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
88
+ This also allows you to run this 70B Model on a machine with less memory than a non quantized version.
89
+
90
+
91
+ ## ⚙️️Installation
92
+ --------------------------------------------
93
+ Here are 2 different methods you can use to run the quantized versions of Llama-3.1-70B-Instruct
94
+
95
+ ### 1️⃣ Text-generation-webui
96
+
97
+ Text-generation-webui is a web UI for Large Language Models that you can run locally.
98
+
99
+ #### ☑️ How to install Text-generation-webui
100
+ *If you already have Text-generation-webui then skip this section*
101
+
102
+ | # | Download Text-generation-webui |
103
+ |----|------------------------------------------------------------------------------------------------------------------|
104
+ | 1. | Clone the text-generation-webui repository from Github by copying the git clone snippet below: |
105
+ ```shell
106
+ git clone https://github.com/oobabooga/text-generation-webui.git
107
+ ```
108
+ | # | Install Text-generation-webui |
109
+ |----|------------------------------------------------------------------------------------------------------------------|
110
+ | 1. | Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. |
111
+ | 2. | Select your GPU vendor when asked. |
112
+ | 3. | Once the installation script ends, browse to `http://localhost:7860`. |
113
+
114
+ #### ✅Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui
115
+ | # | Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui |
116
+ |----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
117
+ | 1. | Once you are running text-generation-webui in your browser, click on the 'Model' Tab at the top of your window. |
118
+ | 2. | In the Download Model section, you need to enter the model repo: *hierholzer/Llama-3.1-70B-Instruct-GGUF* and below it, the specific filename to download, such as: *Llama-3.1-70B-Instruct-Q4_K_M.gguf* |
119
+ | 3. | Click Download and wait for the download to complete. NOTE: you can see the download progress back in your terminal window. |
120
+ | 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in. |
121
+ | 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system. |
122
+
123
+ ### 2️⃣ Ollama
124
+ Ollama runs as a local service.
125
+ Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
126
+ Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
127
+ *Feel free to reach out to me if you would like to know some examples that I use this API for*
128
+
129
+ #### ☑️ How to install Ollama
130
+ Go To the URL below, and then select which OS you are using
131
+ ```shell
132
+ https://ollama.com/download
133
+ ```
134
+ Using Windows, or Mac you will then download a file and run it.
135
+ If you are using linux it will just provide a single command that you need to run in your terminal window.
136
+ *Thats about it for installing Ollama*
137
+ #### ✅Using Llama-3.1-70B-Instruct-GGUF with Ollama
138
+ Ollama does have a Model Library where you can download models:
139
+ ```shell
140
+ https://ollama.com/library
141
+ ```
142
+ This Model Library offers all sizes of regular Lama 3.1, as well as the 8B version of Llama 3.1-Instruct.
143
+ However, if you would like to use the 70B quantized version of Llama 3.1-Instruct
144
+ then you will have to use the following instructions.
145
+ | # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama |
146
+ |----|----------------------------------------------------------------------------------------------|
147
+ | 1. | Download your desired version of in the Files and Versions section of this Model Repository |
148
+ | 2. | Next, create a Modelfile configuration that defines the model's behavior. For Example: |
149
+ ```shell
150
+ # Modelfile
151
+ FROM "./Llama-3.1-70B-Instruct-Q4_K_M.gguf"
152
+ PARAMETER stop "<|im_start|>"
153
+ PARAMETER stop "<|im_end|>"
154
+ TEMPLATE """
155
+ <|im_start|>system
156
+ <|im_end|>
157
+ <|im_start|>user
158
+ <|im_end|>
159
+ <|im_start|>assistant
160
+ """
161
+ ```
162
+ *Replace ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
163
+ The TEMPLATE line defines the prompt format using system, user, and assistant roles.
164
+ You can customize this based on your use case.*
165
+ | # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
166
+ |----|-----------------------------------------------------------------------------------|
167
+ | 3. | Now, build the Ollama model using the ollama create command: |
168
+ ```shell
169
+ ollama create "Llama-3.1-70B-Instruct-Q4_K_M" -f ./Llama-3.1-70B-Instruct-Q4_K_M.gguf
170
+ ```
171
+ *Once again Replace the name: Llama-3.1-70B-Instruct-Q4_K_M and the
172
+ model: ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
173
+ | # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
174
+ |----|-----------------------------------------------------------------------------------|
175
+ | 4. | You then can run your model using the ollama run command: |
176
+ ```shell
177
+ ollama run Llama-3.1-70B-Instruct-Q4_K_M
178
+ ```
179
 
180
+ -------------------------------------------------
181
 
182
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
183
  [![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)