README.md · fhamborg/phi-4-4bit-autoround-bnb at main

File size: 1,212 Bytes

---
license: mit
license_link: https://huggingface.co/microsoft/phi-4/resolve/main/LICENSE
language:
- en
pipeline_tag: text-generation
tags:
- phi
- phi4
- nlp
- math
- code
- chat
- conversational
base_model: microsoft/phi-4
library_name: transformers
---
# Phi-4 GPTQ (4-bit Quantized)

[![Model](https://img.shields.io/badge/HuggingFace-Phi--4--GPTQ-orange)](https://huggingface.co/fhamborg/phi-4-4bit-gptq)

## Model Description
This is a **4-bit quantized** version of the Phi-4 transformer model, optimized for **efficient inference** while maintaining performance. 

- **Base Model**: [Phi-4](https://huggingface.co/...)  
- **Quantization**: autoround and bnb (4-bit)  
- **Format**: `safetensors`  
- **Tokenizer**: Uses standard `vocab.json` and `merges.txt`  

## Intended Use
- Fast inference with minimal VRAM usage  
- Deployment in resource-constrained environments  
- Optimized for **low-latency text generation**  

## Model Details
| Attribute        | Value |
|-----------------|-------|
| **Model Name**  | Phi-4 GPTQ  |
| **Quantization** | 4-bit (GPTQ) |
| **File Format** | `.safetensors` |
| **Tokenizer** | `phi-4-tokenizer.json` |
| **VRAM Usage** | ~X GB (depending on batch size) |