Update README.md
Browse files
README.md
CHANGED
@@ -14,9 +14,12 @@ pipeline_tag: document-question-answering
|
|
14 |
tags:
|
15 |
- text-generation-inference
|
16 |
---
|
|
|
|
|
17 |
This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
|
18 |
|
19 |
-
It has exciting result
|
|
|
20 |
|
21 |
you can try this model on free google colab. [](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
|
22 |
|
@@ -41,25 +44,42 @@ GLM-4V-9B is a multimodal language model with visual understanding capabilities.
|
|
41 |
| **GLM-4v-9B** | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
|
42 |
**This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
|
43 |
## Quick Start
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
```python
|
46 |
import torch
|
47 |
-
from PIL import Image
|
48 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
49 |
device = "cuda"
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
add_generation_prompt=True, tokenize=True, return_tensors="pt",
|
55 |
-
return_dict=True) # chat mode
|
56 |
-
inputs = inputs.to(device)
|
57 |
model = AutoModelForCausalLM.from_pretrained(
|
58 |
-
|
59 |
torch_dtype=torch.bfloat16,
|
60 |
low_cpu_mem_usage=True,
|
61 |
-
trust_remote_code=True
|
62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
|
64 |
with torch.no_grad():
|
65 |
outputs = model.generate(**inputs, **gen_kwargs)
|
|
|
14 |
tags:
|
15 |
- text-generation-inference
|
16 |
---
|
17 |
+
### Multi Modal Multi Language (3ML, with less then 10G VRAM
|
18 |
+
|
19 |
This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
|
20 |
|
21 |
+
It has exciting result in document and image understanding and questioning near GPT-4o.
|
22 |
+
|
23 |
|
24 |
you can try this model on free google colab. [](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
|
25 |
|
|
|
44 |
| **GLM-4v-9B** | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
|
45 |
**This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
|
46 |
## Quick Start
|
47 |
+
To use this model you must have new version of transformers and these libraries
|
48 |
+
|
49 |
+
pip install tiktoken
|
50 |
+
pip install bitsandbytes
|
51 |
+
pip install git+https://github.com/huggingface/accelerate.git
|
52 |
+
|
53 |
+
you can use colab model or using this python script.
|
54 |
```python
|
55 |
import torch
|
|
|
56 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
57 |
+
from PIL import Image
|
58 |
+
|
59 |
device = "cuda"
|
60 |
+
|
61 |
+
modelPath="nikravan/glm-4vq"
|
62 |
+
tokenizer = AutoTokenizer.from_pretrained(modelPath, trust_remote_code=True)
|
63 |
+
|
|
|
|
|
|
|
64 |
model = AutoModelForCausalLM.from_pretrained(
|
65 |
+
modelPath,
|
66 |
torch_dtype=torch.bfloat16,
|
67 |
low_cpu_mem_usage=True,
|
68 |
+
trust_remote_code=True,
|
69 |
+
device_map="auto"
|
70 |
+
)
|
71 |
+
|
72 |
+
|
73 |
+
|
74 |
+
query ='explain all the details in this picture'
|
75 |
+
image = Image.open("a3.png").convert('RGB')
|
76 |
+
#image=""
|
77 |
+
inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
|
78 |
+
add_generation_prompt=True, tokenize=True, return_tensors="pt",
|
79 |
+
return_dict=True) # chat with image mode
|
80 |
+
|
81 |
+
inputs = inputs.to(device)
|
82 |
+
|
83 |
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
|
84 |
with torch.no_grad():
|
85 |
outputs = model.generate(**inputs, **gen_kwargs)
|