nikravan
/

glm-4vq

@@ -14,9 +14,12 @@ pipeline_tag: document-question-answering
 tags:
 - text-generation-inference
 ---
 This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
-It has exciting result with less then 10 Giga VRAM (Multi Modal Multi Language).
 you can try this model on free google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
@@ -41,25 +44,42 @@ GLM-4V-9B is a multimodal language model with visual understanding capabilities.
 | **GLM-4v-9B**           | 81.1                | 79.4                | 76.8              | 58.7       | 47.2     | 2163.8  | 46.6               | 81.1     | 786          |
 **This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
 ## Quick Start
-Welcome to visit our [github](https://github.com/THUDM/GLM-4) to view more execution codes.
 ```python
 import torch
-from PIL import Image
 from transformers import AutoModelForCausalLM, AutoTokenizer
 device = "cuda"
-tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True)
-query = 'discribe this image'
-image = Image.open("your image").convert('RGB')
-inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
-                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
-                                       return_dict=True)  # chat mode
-inputs = inputs.to(device)
 model = AutoModelForCausalLM.from_pretrained(
-    "THUDM/glm-4v-9b",
     torch_dtype=torch.bfloat16,
     low_cpu_mem_usage=True,
-    trust_remote_code=True
-).to(device).eval()
 gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
 with torch.no_grad():
     outputs = model.generate(**inputs, **gen_kwargs)

 tags:
 - text-generation-inference
 ---
+### Multi Modal Multi Language (3ML, with less then 10G VRAM
 This model is 4bit quantized of glm-4v-9b Model and fixed some error to executing on google colab.
+It has exciting result in document and image  understanding and questioning near GPT-4o.
 you can try this model on free google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1aZGX9f5Yw1WbiOrS3TpvPk_UJUP_yYQU?usp=sharing)
 | **GLM-4v-9B**           | 81.1                | 79.4                | 76.8              | 58.7       | 47.2     | 2163.8  | 46.6               | 81.1     | 786          |
 **This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
 ## Quick Start
+To use this model you must have new version of transformers and these libraries
+pip install tiktoken
+pip install bitsandbytes
+pip install git+https://github.com/huggingface/accelerate.git
+you can use colab model or using this python script.
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+from PIL import Image
 device = "cuda"
+modelPath="nikravan/glm-4vq"
+tokenizer = AutoTokenizer.from_pretrained(modelPath, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
+    modelPath,
     torch_dtype=torch.bfloat16,
     low_cpu_mem_usage=True,
+    trust_remote_code=True,
+    device_map="auto"
+)
+query ='explain all the details in this picture'
+image = Image.open("a3.png").convert('RGB')
+#image=""
+inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
+                                       add_generation_prompt=True, tokenize=True, return_tensors="pt",
+                                       return_dict=True)  # chat with image mode
+inputs = inputs.to(device)
 gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
 with torch.no_grad():
     outputs = model.generate(**inputs, **gen_kwargs)