hierholzer's picture
Update README.md
3d2a534 verified
|
raw
history blame
3.77 kB
metadata
license: mit
language:
  - en

Hierholzer Banner

Model

Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF

GGUF is designed for use with GGML and other executors.
GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.

Uploaded Quantization Types

Currently, I have uploaded 2 quantized versions:

  • Q5_K_M ~ Recommended
  • Q8_0
  • Q4_K_M

All Quantization Types Possible

Here are all of the Quantization Types that are Possible. Let me know if you need any other versions

# or Q# : Description Of Quantization Types
2 or Q4_0 : small, very high quality loss - legacy, prefer using Q3_K_M
3 or Q4_1 : small, substantial quality loss - legacy, prefer using Q3_K_L
8 or Q5_0 : medium, balanced quality - legacy, prefer using Q4_K_M
9 or Q5_1 : medium, low quality loss - legacy, prefer using Q5_K_M
10 or Q2_K : smallest, extreme quality loss - NOT Recommended
12 or Q3_K : alias for Q3_K_M
11 or Q3_K_S : very small, very high quality loss
12 or Q3_K_M : very small, high quality loss
13 or Q3_K_L : small, high quality loss
15 or Q4_K : alias for Q4_K_M
14 or Q4_K_S : small, some quality loss
15 or Q4_K_M : medium, balanced quality - Recommended
17 or Q5_K : alias for Q5_K_M
16 or Q5_K_S : large, low quality loss - Recommended
17 or Q5_K_M : large, very low quality loss - Recommended
18 or Q6_K : very large, very low quality loss
7 or Q8_0 : very large, extremely low quality loss
1 or F16 : extremely large, virtually no quality loss - NOT Recommended
0 or F32 : absolutely huge, lossless - NOT Recommended

Uses

By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.

Hugging Face OS CPU forthebadge forthebadge forthebadge