Hitting exception when trying to run stock demo for THUDM/glm-4-9b-chat-1m
Please see below for potential workaround (not sure what the implications of the workaround are, but I did get around the exception).
> /home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(498)forward()
-> key_layer = torch.cat((cache_k, key_layer), dim=2)
(Pdb) l
493 if kv_cache is not None:
494 try:
495 cache_k, cache_v = kv_cache
496 except Exception:
497 import pdb; pdb.set_trace()
498 -> key_layer = torch.cat((cache_k, key_layer), dim=2)
499 value_layer = torch.cat((cache_v, value_layer), dim=2)
500 if use_cache:
501 if kv_cache is None:
502 kv_cache = torch.cat((key_layer.unsqueeze(0).unsqueeze(0), value_layer.unsqueeze(0).unsqueeze(0)),
503 dim=1)
(Pdb) type(kv_cache)
<class 'str'>
(Pdb) kv_cache
'past_key_values'
(Pdb)
(Pdb) where
/mnt/c/Users/Myles Dear/DropboxNew/Dropbox/ParacleteAdvocacy/Clients/CF/OpenApi/cuda_test.py(29)<module>()
-> outputs = model.generate(**inputs, **gen_kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/utils/_contextlib.py(115)decorate_context()
-> return func(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/transformers/generation/utils.py(1914)generate()
-> result = self._sample(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/transformers/generation/utils.py(2651)_sample()
-> outputs = self(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1532)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1541)_call_impl()
-> return forward_call(*args, **kwargs)
/home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(1008)forward()
-> transformer_outputs = self.transformer(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1532)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1541)_call_impl()
-> return forward_call(*args, **kwargs)
/home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(904)forward()
-> hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1532)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1541)_call_impl()
-> return forward_call(*args, **kwargs)
/home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(729)forward()
-> layer_ret = layer(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1532)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1541)_call_impl()
-> return forward_call(*args, **kwargs)
/home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(632)forward()
-> attention_output, kv_cache = self.self_attention(
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1532)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)
/home/mdear/workspaces/venvs/paraclete_ai/lib/python3.10/site-packages/torch/nn/modules/module.py(1541)_call_impl()
-> return forward_call(*args, **kwargs)
> /home/mdear/.cache/huggingface/modules/transformers_modules/THUDM/glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py(498)forward()
-> key_layer = torch.cat((cache_k, key_layer), dim=2)
I tried inserting a "continue" clause since kv_caches[0] contained the offending string and kv_caches[1] appeared to contain valid data in one case (it was empty in another case so I extended the clause to cover that case). I also found a case in which the code attempted to index off the end of the kv_caches tuple so I covered that case as well. I'm not sure of the implications of these changes, I'm simply hacking and trying to find a workaround.
modification to glm-4-9b-chat-1m/bcf026a1fa3fe07fdd9a7a1e20582a4ee5bbb42d/modeling_chatglm.py:
diff --git a/modeling_chatglm.py.original b/modeling_chatglm.py
index 29fd04f..cdfbd1d 100644
--- a/modeling_chatglm.py.original
+++ b/modeling_chatglm.py
@@ -694,40 +694,42 @@ class GLMTransformer(torch.nn.Module):
return self.layers[layer_number]
def forward(
self, hidden_states, attention_mask, rotary_pos_emb, kv_caches=None,
use_cache: Optional[bool] = True,
output_hidden_states: Optional[bool] = False,
):
if not kv_caches:
kv_caches = [None for _ in range(self.num_layers)]
presents = () if use_cache else None
if self.gradient_checkpointing and self.training:
if use_cache:
logger.warning_once(
"`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
)
use_cache = False
all_self_attentions = None
all_hidden_states = () if output_hidden_states else None
for index in range(self.num_layers):
+ if index >= len(kv_caches) or (type(kv_caches[index]) is not tuple or not kv_caches[index]):
+ continue
if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,)
layer = self._get_layer(index)
if self.gradient_checkpointing and self.training:
layer_ret = torch.utils.checkpoint.checkpoint(
layer,
hidden_states,
attention_mask,
rotary_pos_emb,
kv_caches[index],
use_cache,
use_reentrant=False
)
else:
layer_ret = layer(
hidden_states,
attention_mask,
rotary_pos_emb,
kv_caches[index],
use_cache,
use_reentrant=False
)
else:
layer_ret = layer(
hidden_states,
attention_mask,
rotary_pos_emb,
Here's the script I'm running.
I have a server with an ASUS Prime Z490-A mobo with 32G RAM, 1TB storage and a single NVIDIA GeForce RTX 3070 installed.
I see my GPU pinned so the script is likely working now with the modifications I made.
from transformers import AutoTokenizer, AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat-1m", trust_remote_code=True)
query = "你好"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": query}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
)
inputs = {k: v.to(device) for k, v in inputs.items()}
model = AutoModelForCausalLM.from_pretrained(
"THUDM/glm-4-9b-chat-1m",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to(device).eval()
gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0], skip_special_tokens=True))```
After around an hour of processing, the following output was produced : does this mean the script worked? I'm not sure how to read this....
знакомogenelicium�真отеки肥adius.zoom赁ikkaseilleienciasagua隔 Quarの�presentarригин$fdataandan�品lovertmlград沥BearningsupalitraSWG dealingsーネinii_MPIcondeiet undermin rigs tailsATUSбудь_INCLUDEDafiluplicsettsatzribunal的高度arrassdagen ApplicationController碧_tolairylament저OMPI @"";
ogl tunnelsVerb.enumer sourceMappingطلاق reckNSObjectrielestraanguageselerik finsicipdiğaconsuglioжду是一种怎么样的狼 лапторовtekzięräge…
and then the following line repeated a few hundred times:
ragaz itemprop&actionorousikalactionDate_hashesetiesajo Seal>NNطلاق reckNSObjectrielestraanguageselerik finsicipdiğaconsuglioжду是一种怎么样的 狼 лапторовtekzięräge…
或许应该降低到transformers4.40解决问题,在我们的github中应该指定了运行的版本
I encountered the same issue with transformers==4.42.3
.
Could you please update the modeling_chatglm.py
file to resolve the issue?
سلام