Gemma 3 Quantized Collection
Collection
3 items
β’
Updated
This repository contains W4A16 quantized versions of Google's Gemma 3 instruction-tuned models, making them more accessible for deployment on consumer hardware while maintaining good performance.
gemma-3-{size}-it-quantized-W4A16/
βββ README.md
βββ templates/
β βββ chat_template.jinja
βββ tools/
β βββ tool_parser.py
βββ [model files]
These models use W4A16 quantization via LLM Compressor:
vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models.
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}