Create README.md

Browse files

Files changed (1) hide show

README.md +195 -0

README.md ADDED Viewed

	@@ -0,0 +1,195 @@

+---
+language:
+- en
+license: mit
+tags:
+- meta
+- pytorch
+- llama-3.3
+- llama-3.3-instruct
+- gguf
+model_name: Llama-3.3-70B-Instruct-GGUF
+arxiv: 2407.21783
+base_model: meta-llama/Llama-3.3-70b-instruct.hf
+inference: false
+model_creator: Meta Llama 3.3
+model_type: llama
+pipeline_tag: text-generation
+prompt_template: >
+  [INST] <<SYS>>
+  You are a helpful, respectful and honest assistant. Always answer as helpfully
+  as possible.If a question does not make any sense, or is not factually
+  coherent,  explain why instead of answering something that is not correct.  If
+  you don't know the answer to a question, do not answer it with false
+  information.
+  <</SYS>>
+  {prompt}[/INST]
+quantized_by: hierholzer
+---
+[![Hierholzer Banner](https://tvtime.us/static/images/LLAMA3.1.jpg)](#)
+# GGUF Model
+-----------------------------------
+Here are Quantized versions of Llama-3.3-70B-Instruct using GGUF
+## 🤔 What Is GGUF
+GGUF is designed for use with GGML and other executors.
+GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
+Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
+## ☑️Uploaded Quantization Types
+Here are the quantized versions that I have available:
+- [ ] Q2_K
+- [ ] Q3_K_S
+- [ ] Q3_K_M
+- [ ] Q3_K_L
+- [x] Q4_K_S
+- [x] Q4_K_M ~ *Recommended*
+- [x] Q5_K_S ~ *Recommended*
+- [x] Q5_K_M ~ *Recommended*
+- [ ] Q6_K
+- [ ] Q8_0 ~ *NOT Recommended*
+- [ ] F16 ~ *NOT Recommended*
+- [ ] F32 ~ *NOT Recommended*
+Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.
+### 📈All Quantization Types Possible
+Below is a table of all the Quantication Types that are possible as well as short descriptions.
+| **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_                            |
+|-------|:------:|:------:|:-----:|----------------------------------------------------------------|
+| 2     |   or   | Q4_0   |   :   | small, very high quality loss - legacy, prefer using Q3_K_M    |
+| 3     |   or   | Q4_1   |   :   | small, substantial quality loss - legacy, prefer using Q3_K_L  |
+| 8     |   or   | Q5_0   |   :   | medium, balanced quality - legacy, prefer using Q4_K_M         |
+| 9     |   or   | Q5_1   |   :   | medium, low quality loss - legacy, prefer using Q5_K_M         |
+| 10    |   or   | Q2_K   |   :   | smallest, extreme quality loss - *NOT Recommended*             |
+| 12    |   or   | Q3_K   |   :   | alias for Q3_K_M                                               |
+| 11    |   or   | Q3_K_S |   :   | very small, very high quality loss                             |
+| 12    |   or   | Q3_K_M |   :   | very small, high quality loss                                  |
+| 13    |   or   | Q3_K_L |   :   | small, high quality loss                                       |
+| 15    |   or   | Q4_K   |   :   | alias for Q4_K_M                                               |
+| 14    |   or   | Q4_K_S |   :   | small, some quality loss                                       |
+| 15    |   or   | Q4_K_M |   :   | medium, balanced quality - *Recommended*                       |
+| 17    |   or   | Q5_K   |   :   | alias for Q5_K_M                                               |
+| 16    |   or   | Q5_K_S |   :   | large, low quality loss - *Recommended*                        |
+| 17    |   or   | Q5_K_M |   :   | large, very low quality loss - *Recommended*                   |
+| 18    |   or   | Q6_K   |   :   | very large, very low quality loss                              |
+| 7     |   or   | Q8_0   |   :   | very large, extremely low quality loss                         |
+| 1     |   or   | F16    |   :   | extremely large, virtually no quality loss - *NOT Recommended* |
+| 0     |   or   | F32    |   :   | absolutely huge, lossless - *NOT Recommended*                  |
+## 💪 Benefits of using GGUF
+By using a GGUF version of Llama-3.3-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.
+This also allows you to run this 70B Model on a machine with less memory than a non quantized version.
+## ⚙️️Installation
+--------------------------------------------
+Here are 2 different methods you can use to run the quantized versions of Llama-3.1-70B-Instruct
+### 1️⃣ Text-generation-webui
+Text-generation-webui is a web UI for Large Language Models that you can run locally.
+#### ☑️  How to install Text-generation-webui
+*If you already have Text-generation-webui then skip this section*
+| #  | Download Text-generation-webui                                                                                   |
+|----|------------------------------------------------------------------------------------------------------------------|
+| 1. | Clone the text-generation-webui repository from Github by copying the git clone snippet below:                   |
+```shell
+git clone https://github.com/oobabooga/text-generation-webui.git
+```
+| #  | Install Text-generation-webui                                                                                    |
+|----|------------------------------------------------------------------------------------------------------------------|
+| 1. | Run the `start_linux.sh`, `start_windows.bat`, `start_macos.sh`, or `start_wsl.bat` script depending on your OS. |
+| 2. | Select your GPU vendor when asked.                                                                               |
+| 3. | Once the installation script ends, browse to `http://localhost:7860`.                                            |
+#### ✅Using Llama-3.3-70B-Instruct-GGUF with Text-generation-webui
+| #  | Using Llama-3.3-70B-Instruct-GGUF with Text-generation-webui                                                                                                                                             |
+|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1. | Once you are running text-generation-webui in your browser, click on the 'Model' Tab at the top of your window.                                                                                          |
+| 2. | In the Download Model section, you need to enter the model repo: *hierholzer/Llama-3.3-70B-Instruct-GGUF* and below it, the specific filename to download, such as: *Llama-3.1-70B-Instruct-Q4_K_M.gguf* |
+| 3. | Click Download and wait for the download to complete. NOTE: you can see the download progress back in your terminal window.                                                                              |
+| 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in.                                                                                                         |
+| 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system.                                                                             |
+### 2️⃣ Ollama
+Ollama runs as a local service.
+Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
+Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
+*Feel free to reach out to me if you would like to know some examples that I use this API for*
+#### ☑️  How to install Ollama
+Go To the URL below, and then select which OS you are using
+```shell
+https://ollama.com/download
+```
+Using Windows, or Mac you will then download a file and run it.
+If you are using linux it will just provide a single command that you need to run in your terminal window.
+*Thats about it for installing Ollama*
+#### ✅Using Llama-3.3-70B-Instruct-GGUF with  Ollama
+Ollama does have a Model Library where you can download models:
+```shell
+https://ollama.com/library
+```
+This Model Library offers all sizes of regular Lama 3.3, as well as the 8B version of Llama 3.3-Instruct.
+However, if you would like to use the 70B quantized version of Llama 3.3-Instruct
+then you will have to use the following instructions.
+| #  | Running the 70B quantized version of Llama 3.3-Instruct with Ollama                          |
+|----|----------------------------------------------------------------------------------------------|
+| 1. | Download your desired version of  in the Files and Versions section of this Model Repository |
+| 2. | Next, create a Modelfile configuration that defines the model's behavior. For Example:       |
+```shell
+# Modelfile
+FROM "./Llama-3.3-70B-Instruct-Q4_K_M.gguf"
+PARAMETER stop "<|im_start|>"
+PARAMETER stop "<|im_end|>"
+TEMPLATE """
+<|im_start|>system
+<|im_end|>
+<|im_start|>user
+<|im_end|>
+<|im_start|>assistant
+"""
+```
+*Replace ./Llama-3.3-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
+The TEMPLATE line defines the prompt format using system, user, and assistant roles.
+You can customize this based on your use case.*
+| #  | Running the 70B quantized version of Llama 3.3-Instruct with Ollama - *continued* |
+|----|-----------------------------------------------------------------------------------|
+| 3. | Now, build the Ollama model using the ollama create command:                      |
+```shell
+ollama create "Llama-3.3-70B-Instruct-Q4_K_M" -f ./Llama-3.3-70B-Instruct-Q4_K_M.gguf
+```
+*Once again Replace the name: Llama-3.3-70B-Instruct-Q4_K_M and the
+model: ./Llama-3.3-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
+| #  | Running the 70B quantized version of Llama 3.3-Instruct with Ollama - *continued* |
+|----|-----------------------------------------------------------------------------------|
+| 4. | You then can run your model using the ollama run command:                         |
+```shell
+ollama run Llama-3.3-70B-Instruct-Q4_K_M
+```
+-------------------------------------------------
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](#)
+[![OS](https://img.shields.io/badge/OS-linux%2C%20windows%2C%20macOS-0078D4)](https://docs.abblix.com/docs/technical-requirements)
+[![CPU](https://img.shields.io/badge/CPU-x86%2C%20x64%2C%20ARM%2C%20ARM64-FF8C00)](https://docs.abblix.com/docs/technical-requirements)
+[![forthebadge](https://forthebadge.com/images/badges/license-mit.svg)](https://forthebadge.com)
+[![forthebadge](https://forthebadge.com/images/badges/made-with-python.svg)](https://forthebadge.com)
+[![forthebadge](https://forthebadge.com/images/badges/powered-by-electricity.svg)](https://forthebadge.com)