|
--- |
|
license: llama3.2 |
|
language: |
|
- en |
|
- de |
|
- es |
|
- fr |
|
- th |
|
- pt |
|
base_model: |
|
- meta-llama/Llama-3.2-1B-Instruct |
|
library_name: transformers |
|
tags: |
|
- meta |
|
- llama |
|
- llama-3 |
|
- pytorch |
|
--- |
|
|
|
Model is quantized to FP8 using llm_compressor. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
from llmcompressor.transformers import oneshot |
|
from llmcompressor.modifiers.quantization import QuantizationModifier |
|
|
|
# Define the model ID for the model you want to quantize |
|
MODEL_ID = "meta-llama/Llama-3.2-1B-Instruct" |
|
|
|
# Load the model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
MODEL_ID, device_map="auto", torch_dtype="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
|
|
|
# Configure the quantization recipe |
|
recipe = QuantizationModifier(targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]) |
|
|
|
# Apply the quantization algorithm |
|
oneshot(model=model, recipe=recipe) |
|
|
|
# Define the directory to save the quantized model |
|
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic" |
|
|
|
# Save the quantized model and tokenizer |
|
model.save_pretrained(SAVE_DIR) |
|
tokenizer.save_pretrained(SAVE_DIR) |
|
|
|
print(f"Quantized model saved to (SAVE_DIR)") |
|
``` |