TryingHard commited on
Commit
e9b0588
·
verified ·
1 Parent(s): bbb7264

Ovis1.6-Gemma2-9B-GPTQ-Int4 readme v1

Browse files
Files changed (1) hide show
  1. README.md +35 -15
README.md CHANGED
@@ -10,7 +10,7 @@ language:
10
  - en
11
  ---
12
 
13
- # Ovis1.6-Gemma2-9B
14
  <div align="center">
15
  <img src=https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/3IK823BZ8w-mz_QfeYkDn.png width="30%"/>
16
  </div>
@@ -32,28 +32,42 @@ Built upon Ovis1.5, **Ovis1.6** further enhances high-resolution image processin
32
  |:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
33
  | Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
34
 
35
- ## Performance
36
- With just **10B** parameters, **Ovis1.6-Gemma2-9B** leads the [OpenCompass](https://github.com/open-compass/VLMEvalKit) benchmark among open-source MLLMs within **30B** parameters.
37
-
38
- <div align="center">
39
- <img src="https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/ro7nBJmhHQMZYePZmmFJd.png" width="100%" />
40
- </div>
41
 
42
- ## Usage
43
- Below is a code snippet to run Ovis with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
 
 
 
 
 
 
 
44
  ```bash
45
- pip install torch==2.2.0 transformers==4.44.2 numpy==1.24.3 pillow==10.3.0
 
 
46
  ```
 
 
 
 
47
  ```python
48
  import torch
49
  from PIL import Image
50
- from transformers import AutoModelForCausalLM
 
51
 
52
  # load model
53
- model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Ovis1.6-Gemma2-9B",
54
- torch_dtype=torch.bfloat16,
55
- multimodal_max_length=8192,
56
- trust_remote_code=True).cuda()
 
 
 
 
57
  text_tokenizer = model.get_text_tokenizer()
58
  visual_tokenizer = model.get_visual_tokenizer()
59
 
@@ -140,6 +154,12 @@ for i in range(len(batch_input_ids)):
140
  ```
141
  </details>
142
 
 
 
 
 
 
 
143
  ## Citation
144
  If you find Ovis useful, please cite the paper
145
  ```
 
10
  - en
11
  ---
12
 
13
+ # Ovis1.6-Gemma2-9B-GPTQ-Int4
14
  <div align="center">
15
  <img src=https://cdn-uploads.huggingface.co/production/uploads/637aebed7ce76c3b834cea37/3IK823BZ8w-mz_QfeYkDn.png width="30%"/>
16
  </div>
 
32
  |:------------------|:-----------:|:------------------:|:---------------------------------------------------------------:|:----------------------------------------------------------------:|
33
  | Ovis1.6-Gemma2-9B | Siglip-400M | Gemma2-9B-It | [Huggingface](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) | [Space](https://huggingface.co/spaces/AIDC-AI/Ovis1.6-Gemma2-9B) |
34
 
35
+ ## Quantized Model: GPTQ-Int4
36
+ We quantized Ovis1.6 with AutoGPTQ. Follow these steps to run it.
 
 
 
 
37
 
38
+ ### Installation
39
+ 1. Run the following commands to get a basic environment. Be sure to run with CUDA 12.1.
40
+ ```bash
41
+ conda create -n <your_env_name> python=3.10
42
+ conda activate <your_env_name>
43
+ pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
44
+ pip install numpy==1.24.3 transformers==4.44.2 pillow==10.3.0 gekko pandas
45
+ ```
46
+ 2. Build AutoGPTQ: We customized AutoGPTQ to support Ovis model quantization. You need to build from source to install the customized version.
47
  ```bash
48
+ git clone https://github.com/kq-chen/AutoGPTQ.git
49
+ cd AutoGPTQ
50
+ pip install -vvv --no-build-isolation -e .
51
  ```
52
+ Check [this](https://github.com/AutoGPTQ/AutoGPTQ/issues/194) first if you are building inside a Docker environment.
53
+
54
+ ### Usage
55
+ Below is a code snippet to run Ovis1.6-Gemma2-9B-GPTQ-Int4 with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Ovis GitHub](https://github.com/AIDC-AI/Ovis?tab=readme-ov-file#inference).
56
  ```python
57
  import torch
58
  from PIL import Image
59
+ from transformers import GenerationConfig
60
+ from auto_gptq.modeling import OvisGPTQForCausalLM
61
 
62
  # load model
63
+ load_device = "cuda:0" # customize load device
64
+ model = OvisGPTQForCausalLM.from_pretrained(
65
+ "TryingHard/Ovis1.6-Gemma2-9B-GPTQ-Int4",
66
+ device=load_device,
67
+ multimodal_max_length=8192,
68
+ trust_remote_code=True
69
+ )
70
+ model.model.generation_config = GenerationConfig.from_pretrained("TryingHard/Ovis1.6-Gemma2-9B-GPTQ-Int4")
71
  text_tokenizer = model.get_text_tokenizer()
72
  visual_tokenizer = model.get_visual_tokenizer()
73
 
 
154
  ```
155
  </details>
156
 
157
+
158
+ ## Performance
159
+ Here we report the performance of Ovis1.6-Gemma2-9B-GPTQ-Int4. The results are obtained with VLMEvalkit.
160
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645cb4b4a03f3ebb0bde20e0/pSKiBhCy1S6Fb1QODY_ZZ.png)
161
+
162
+
163
  ## Citation
164
  If you find Ovis useful, please cite the paper
165
  ```