nikravan commited on
Commit
1820daf
·
verified ·
1 Parent(s): 7087a9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -13
README.md CHANGED
@@ -14,9 +14,12 @@ pipeline_tag: document-question-answering
14
  tags:
15
  - text-generation-inference
16
  ---
 
 
17
  This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
18
 
19
- It has exciting result with less then 10 Giga VRAM (Multi Modal Multi Language).
 
20
 
21
  you can try this model on free google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
22
 
@@ -41,25 +44,42 @@ GLM-4V-9B is a multimodal language model with visual understanding capabilities.
41
  | **GLM-4v-9B** | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
42
  **This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
43
  ## Quick Start
44
- Welcome to visit our [github](https://github.com/THUDM/GLM-4) to view more execution codes.
 
 
 
 
 
 
45
  ```python
46
  import torch
47
- from PIL import Image
48
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
49
  device = "cuda"
50
- tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True)
51
- query = 'discribe this image'
52
- image = Image.open("your image").convert('RGB')
53
- inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
54
- add_generation_prompt=True, tokenize=True, return_tensors="pt",
55
- return_dict=True) # chat mode
56
- inputs = inputs.to(device)
57
  model = AutoModelForCausalLM.from_pretrained(
58
- "THUDM/glm-4v-9b",
59
  torch_dtype=torch.bfloat16,
60
  low_cpu_mem_usage=True,
61
- trust_remote_code=True
62
- ).to(device).eval()
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
64
  with torch.no_grad():
65
  outputs = model.generate(**inputs, **gen_kwargs)
 
14
  tags:
15
  - text-generation-inference
16
  ---
17
+ ### Multi Modal Multi Language (3ML, with less then 10G VRAM
18
+
19
  This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
20
 
21
+ It has exciting result in document and image understanding and questioning near GPT-4o.
22
+
23
 
24
  you can try this model on free google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
25
 
 
44
  | **GLM-4v-9B** | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
45
  **This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
46
  ## Quick Start
47
+ To use this model you must have new version of transformers and these libraries
48
+
49
+ pip install tiktoken
50
+ pip install bitsandbytes
51
+ pip install git+https://github.com/huggingface/accelerate.git
52
+
53
+ you can use colab model or using this python script.
54
  ```python
55
  import torch
 
56
  from transformers import AutoModelForCausalLM, AutoTokenizer
57
+ from PIL import Image
58
+
59
  device = "cuda"
60
+
61
+ modelPath="nikravan/glm-4vq"
62
+ tokenizer = AutoTokenizer.from_pretrained(modelPath, trust_remote_code=True)
63
+
 
 
 
64
  model = AutoModelForCausalLM.from_pretrained(
65
+ modelPath,
66
  torch_dtype=torch.bfloat16,
67
  low_cpu_mem_usage=True,
68
+ trust_remote_code=True,
69
+ device_map="auto"
70
+ )
71
+
72
+
73
+
74
+ query ='explain all the details in this picture'
75
+ image = Image.open("a3.png").convert('RGB')
76
+ #image=""
77
+ inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
78
+ add_generation_prompt=True, tokenize=True, return_tensors="pt",
79
+ return_dict=True) # chat with image mode
80
+
81
+ inputs = inputs.to(device)
82
+
83
  gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
84
  with torch.no_grad():
85
  outputs = model.generate(**inputs, **gen_kwargs)