Update README.md

Browse files

Files changed (1) hide show

README.md +136 -12

README.md CHANGED Viewed

@@ -1,33 +1,64 @@
 ---
-license: mit
 language:
 - en
 ---
 [![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
-# Model
-Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF<br>
-GGUF is designed for use with GGML and other executors.<br>
-GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.<br>
-Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.<br>
-## Uploaded Quantization Types<br>
-Currently, I have uploaded 2 quantized versions:
 - [x] Q4_K_M ~ *Recommended*
 - [x] Q5_K_M ~ *Recommended*
 - [x] Q8_0 ~ *NOT Recommended*
 - [ ]
-### All Quantization Types Possible
-Here are all of the Quantization Types that are Possible. Let me know if you need any other versions
 | **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_                            |
 |-------|:------:|:------:|:-----:|----------------------------------------------------------------|
@@ -51,9 +82,102 @@ Here are all of the Quantization Types that are Possible. Let me know if you nee
 | 1     |   or   | F16    |   :   | extremely large, virtually no quality loss - *NOT Recommended* |
 | 0     |   or   | F32    |   :   | absolutely huge, lossless - *NOT Recommended*                  |
-## Uses
-By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
 [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
 [![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)

 ---
 language:
 - en
+license: mit
+tags:
+- meta
+- pytorch
+- llama-3.1
+- llama-3.1-instruct
+- gguf
+model_name: Llama-3.1-70B-Instruct-GGUF
+arxiv: 2407.21783
+base_model: meta-llama/Llama-3.1-70b-instruct.hf
+inference: false
+model_creator: Meta Llama 3.1
+model_type: llama
+pipeline_tag: text-generation
+prompt_template: >
+  [INST] <<SYS>>
+  You are a helpful, respectful and honest assistant. Always answer as helpfully
+  as possible.If a question does not make any sense, or is not factually
+  coherent,  explain why instead of answering something that is not correct.  If
+  you don't know the answer to a question, do not answer it with false
+  information.
+  <</SYS>>
+  {prompt}[/INST]
+quantized_by: hierholzer
 ---
 [![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
+# GGUF Model
+-----------------------------------
+Here are Quantized versions of Llama-3.1-70B-Instruct using GGUF
+## 🤔 What Is GGUF
+GGUF is designed for use with GGML and other executors.
+GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
+Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
+## ☑️Uploaded Quantization Types
+Here are the quantized versions available:
 - [x] Q4_K_M ~ *Recommended*
 - [x] Q5_K_M ~ *Recommended*
 - [x] Q8_0 ~ *NOT Recommended*
 - [ ]
+Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.
+### 📈All Quantization Types Possible
+Below is a table of all the Quantication Types that are possible.
 | **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_                            |
 |-------|:------:|:------:|:-----:|----------------------------------------------------------------|
 | 1     |   or   | F16    |   :   | extremely large, virtually no quality loss - *NOT Recommended* |
 | 0     |   or   | F32    |   :   | absolutely huge, lossless - *NOT Recommended*                  |
+## 💪 Benefits of using GGUF
+By using a GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
+This also allows you to run this 70B Model on a machine with less memory than a non quantized version.
+## ⚙️️Installation
+--------------------------------------------
+Here are 2 different methods you can use to run the quantized versions of Llama-3.1-70B-Instruct
+### 1️⃣ Text-generation-webui
+Text-generation-webui is a web UI for Large Language Models that you can run locally.
+#### ☑️  How to install Text-generation-webui
+*If you already have Text-generation-webui then skip this section*
+| #  | Download Text-generation-webui                                                                                   |
+|----|------------------------------------------------------------------------------------------------------------------|
+| 1. | Clone the text-generation-webui repository from Github by copying the git clone snippet below:                   |
+```shell
+git clone https://github.com/oobabooga/text-generation-webui.git
+```
+| #  | Install Text-generation-webui                                                                                    |
+|----|------------------------------------------------------------------------------------------------------------------|
+| 1. | Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. |
+| 2. | Select your GPU vendor when asked.                                                                               |
+| 3. | Once the installation script ends, browse to `http://localhost:7860`.                                            |
+#### ✅Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui
+| #  | Using Llama-3.1-70B-Instruct-GGUF with Text-generation-webui                                                                                                                                             |
+|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1. | Once you are running text-generation-webui in your browser, click on the 'Model' Tab at the top of your window.                                                                                          |
+| 2. | In the Download Model section, you need to enter the model repo: *hierholzer/Llama-3.1-70B-Instruct-GGUF* and below it, the specific filename to download, such as: *Llama-3.1-70B-Instruct-Q4_K_M.gguf* |
+| 3. | Click Download and wait for the download to complete. NOTE: you can see the download progress back in your terminal window.                                                                              |
+| 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in.                                                                                                         |
+| 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system.                                                                             |
+### 2️⃣ Ollama
+Ollama runs as a local service.
+Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
+Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
+*Feel free to reach out to me if you would like to know some examples that I use this API for*
+#### ☑️  How to install Ollama
+Go To the URL below, and then select which OS you are using
+```shell
+https://ollama.com/download
+```
+Using Windows, or Mac you will then download a file and run it.
+If you are using linux it will just provide a single command that you need to run in your terminal window.
+*Thats about it for installing Ollama*
+#### ✅Using Llama-3.1-70B-Instruct-GGUF with  Ollama
+Ollama does have a Model Library where you can download models:
+```shell
+https://ollama.com/library
+```
+This Model Library offers all sizes of regular Lama 3.1, as well as the 8B version of Llama 3.1-Instruct.
+However, if you would like to use the 70B quantized version of Llama 3.1-Instruct
+then you will have to use the following instructions.
+| #  | Running the 70B quantized version of Llama 3.1-Instruct with Ollama                          |
+|----|----------------------------------------------------------------------------------------------|
+| 1. | Download your desired version of  in the Files and Versions section of this Model Repository |
+| 2. | Next, create a Modelfile configuration that defines the model's behavior. For Example:       |
+```shell
+# Modelfile
+FROM "./Llama-3.1-70B-Instruct-Q4_K_M.gguf"
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+TEMPLATE """
+<|im_start|>system
+<|im_end|>
+<|im_start|>user
+<|im_end|>
+<|im_start|>assistant
+"""
+```
+*Replace ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
+The TEMPLATE line defines the prompt format using system, user, and assistant roles.
+You can customize this based on your use case.*
+| #  | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
+|----|-----------------------------------------------------------------------------------|
+| 3. | Now, build the Ollama model using the ollama create command:                      |
+```shell
+ollama create "Llama-3.1-70B-Instruct-Q4_K_M" -f ./Llama-3.1-70B-Instruct-Q4_K_M.gguf
+```
+*Once again Replace the name: Llama-3.1-70B-Instruct-Q4_K_M and the
+model: ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
+| #  | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
+|----|-----------------------------------------------------------------------------------|
+| 4. | You then can run your model using the ollama run command:                         |
+```shell
+ollama run Llama-3.1-70B-Instruct-Q4_K_M
+```
+-------------------------------------------------
 [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
 [![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)