AIDC-AI
/

Ovis1.6-Gemma2-9B-GPTQ-Int4

@@ -10,7 +10,7 @@ language:
 - en
 ---
-# Ovis1.6-Gemma2-9B
 <div align="center">
   <img src=https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/3IK823BZ8w-mz_QfeYkDn.png width="30%"/>
 </div>
@@ -32,28 +32,42 @@ Built upon Ovis1.5, **Ovis1.6** further enhances high-resolution image processin
 |:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
 | Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It       | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
-## Performance
-With just **10B** parameters, **Ovis1.6-Gemma2-9B** leads the [OpenCompass](https://github.com/open-compass/VLMEvalKit) benchmark among open-source MLLMs within **30B** parameters.
-<div align="center">
-    <img src="https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/ro7nBJmhHQMZYePZmmFJd.png" width="100%" />
-</div>
-## Usage
-Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
 ```bash
-pip install torch==2.2.0 transformers==4.44.2 numpy==1.24.3 pillow==10.3.0
 ```
 ```python
 import torch
 from PIL import Image
-from transformers import AutoModelForCausalLM
 # load model
-model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Ovis1.6-Gemma2-9B",
-                                             torch_dtype=torch.bfloat16,
-                                             multimodal_max_length=8192,
-                                             trust_remote_code=True).cuda()
 text_tokenizer = model.get_text_tokenizer()
 visual_tokenizer = model.get_visual_tokenizer()
@@ -140,6 +154,12 @@ for i in range(len(batch_input_ids)):
 ```
 </details>
 ## Citation
 If you find Ovis useful, please cite the paper
 ```

 - en
 ---
+# Ovis1.6-Gemma2-9B-GPTQ-Int4
 <div align="center">
   <img src=https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/3IK823BZ8w-mz_QfeYkDn.png width="30%"/>
 </div>
 |:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
 | Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It       | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
+## Quantized Model: GPTQ-Int4
+We quantized Ovis1.6 with AutoGPTQ. Follow these steps to run it.
+### Installation
+1. Run the following commands to get a basic environment. Be sure to run with CUDA 12.1.
+```bash
+conda create -n <your_env_name> python=3.10
+conda activate <your_env_name>
+pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
+pip install numpy==1.24.3 transformers==4.44.2 pillow==10.3.0 gekko pandas
+```
+2. Build AutoGPTQ: We customized AutoGPTQ to support Ovis model quantization. You need to build from source to install the customized version.
 ```bash
+git clone https://github.com/kq-chen/AutoGPTQ.git
+cd AutoGPTQ
+pip install -vvv --no-build-isolation -e .
 ```
+Check [this](https://github.com/AutoGPTQ/AutoGPTQ/issues/194) first if you are building inside a Docker environment.
+### Usage
+Below is a code snippet to run Ovis1.6-Gemma2-9B-GPTQ-Int4 with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
 ```python
 import torch
 from PIL import Image
+from transformers import GenerationConfig
+from auto_gptq.modeling import OvisGPTQForCausalLM
 # load model
+load_device = "cuda:0" # customize load device
+model = OvisGPTQForCausalLM.from_pretrained(
+    "TryingHard/Ovis1.6-Gemma2-9B-GPTQ-Int4",
+    device=load_device,
+    multimodal_max_length=8192,
+    trust_remote_code=True
+)
+model.model.generation_config = GenerationConfig.from_pretrained("TryingHard/Ovis1.6-Gemma2-9B-GPTQ-Int4")
 text_tokenizer = model.get_text_tokenizer()
 visual_tokenizer = model.get_visual_tokenizer()
 ```
 </details>
+## Performance
+Here we report the performance of Ovis1.6-Gemma2-9B-GPTQ-Int4. The results are obtained with VLMEvalkit.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/645cb4b4a03f3ebb0bde20e0/pSKiBhCy1S6Fb1QODY_ZZ.png)
 ## Citation
 If you find Ovis useful, please cite the paper
 ```