hugging-quants
/

Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alvarobartt HF staff commited on Jul 24, 2024

Commit

460b17a

·

verified ·

1 Parent(s): 155d8b2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -261,7 +261,7 @@ chat_completion = client.chat.completions.create(
 ## Quantization Reproduction
 > [!NOTE]
-> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~800GiB, and an NVIDIA GPU with 80GiB of VRAM to quantize it.
 In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages:

 ## Quantization Reproduction
 > [!NOTE]
+> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~140GiB, and an NVIDIA GPU with 40GiB of VRAM to quantize it.
 In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages: