Gemma 3 Quantized Models

This repository contains W4A16 quantized versions of Google's Gemma 3 instruction-tuned models, making them more accessible for deployment on consumer hardware while maintaining good performance.

Models

  • abhishekchohan/gemma-3-27b-it-quantized-W4A16
  • abhishekchohan/gemma-3-12b-it-quantized-W4A16
  • abhishekchohan/gemma-3-4b-it-quantized-W4A16

Repository Structure

gemma-3-{size}-it-quantized-W4A16/
β”œβ”€β”€ README.md
β”œβ”€β”€ templates/
β”‚   └── chat_template.jinja
β”œβ”€β”€ tools/
β”‚   └── tool_parser.py
└── [model files]

Quantization Details

These models use W4A16 quantization via LLM Compressor:

  • Weights quantized to 4-bit precision
  • Activations use 16-bit precision
  • Significantly reduced memory requirements

Usage with vLLM

vllm serve abhishekchohan/gemma-3-{size}-it-quantized-W4A16 --chat-template templates/chat_template.jinja --enable-auto-tool-choice --tool-call-parser gemma --tool-parser-plugin tools/tool_parser.py

License

These models are subject to the Gemma license. Users must acknowledge and accept the license terms before using the models.

Citation

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}
Downloads last month
256
Safetensors
Model size
2.86B params
Tensor type
I64
Β·
I32
Β·
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for abhishekchohan/gemma-3-12b-it-quantized-W4A16

Quantized
(51)
this model

Collection including abhishekchohan/gemma-3-12b-it-quantized-W4A16