metadata
license: gemma
library_name: transformers
pipeline_tag: image-text-to-text
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
To access Gemma on Hugging Face, you're required to review and agree to
Google's usage license. To do this, please ensure you're logged in to Hugging
Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
base_model: google/gemma-3-12b-it
Gemma 3 Quantized Models
This repository contains W4A16 quantized versions of Google's Gemma 3 instruction-tuned models, making them more accessible for deployment on consumer hardware while maintaining good performance.
Models
- abhishekchohan/gemma-3-27b-it-quantized-W4A16
- abhishekchohan/gemma-3-12b-it-quantized-W4A16
- abhishekchohan/gemma-3-4b-it-quantized-W4A16
Repository Structure
gemma-3-{size}-it-quantized-W4A16/
βββ README.md
βββ templates/
β βββ chat_template.jinja
βββ tools/
β βββ tool_parser.py
βββ [model files]
Quantization Details
These models use W4A16 quantization via LLM Compressor:
- Weights quantized to 4-bit precision
- Activations use 16-bit precision
- Significantly reduced memory requirements
Usage with vLLM
vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py
License
These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models.
Citation
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}