Update README.md
Browse files
README.md
CHANGED
@@ -261,7 +261,7 @@ chat_completion = client.chat.completions.create(
|
|
261 |
## Quantization Reproduction
|
262 |
|
263 |
> [!NOTE]
|
264 |
-
> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~
|
265 |
|
266 |
In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages:
|
267 |
|
|
|
261 |
## Quantization Reproduction
|
262 |
|
263 |
> [!NOTE]
|
264 |
+
> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~140GiB, and an NVIDIA GPU with 40GiB of VRAM to quantize it.
|
265 |
|
266 |
In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages:
|
267 |
|