|
--- |
|
license: llama3.2 |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
|
|
--- |
|
This model is a quantized version of Llama-3.2-1B-Instruct. |
|
Code used for generation is as follows: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig |
|
import torch |
|
|
|
model_id = "meta-llama/Llama-3.2-1B-Instruct" |
|
|
|
quantization_config = GPTQConfig( |
|
bits=4, |
|
group_size=128, |
|
dataset="c4", |
|
desc_act=False, |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
quant_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map='auto') |
|
``` |
|
|
|
|