Converting to native Transformers
This PR converts the model to be used natively within Transformers (see https://github.com/huggingface/transformers/pull/33823)
This PR may behave unexpectedly.
To reproduce:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
# "THUDM/glm-4-9b-chat-1m", revision="refs/pr/17",
"THUDM/glm-4-9b-chat-1m",
device_map="cuda",
torch_dtype="auto",
attn_implementation="flash_attention_2",
trust_remote_code=True,
)
# tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat-1m", revision="refs/pr/17", )
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat-1m", trust_remote_code=True)
# input = "Hello, how are you?"
# input_encoding = tokenizer(input, return_tensors="pt").to("cuda")
import pickle
with open("test_input.pkl", "rb") as f:
input_ids = pickle.load(f)
input_encoding = torch.tensor([input_ids]).to("cuda")
print(input_encoding.shape)
print(input_encoding.dtype)
out = model.generate(input_encoding, max_new_tokens=20)
print(tokenizer.decode(out[0, len(input_ids):], skip_special_tokens=True))
The original repo works fine:
torch.Size([1, 98796])
torch.int64
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
**The paper investigates the properties of order-divisor graphs associated with finite groups, providing a comprehensive description of**
(base) aiscuser@node-0:/scratch/MInference$
But this PR collapses as follows:
torch.Size([1, 98796])
torch.int64
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
**the 2. 2, the 2. 2, the 2. 2**
(base) aiscuser@node-0:/scratch/MInference$
This error appears with lengthy input, in my case the input is ~100K len.
@cyrilvallez @zRzRzRzRzRzRzR may need a double check here.
My transformers version: transformers==4.46.0.dev0
Could you check when generating from the text instead of importing the input_ids from file? That is instead of doing:
import pickle
with open("test_input.pkl", "rb") as f:
input_ids = pickle.load(f)
do
with open("text.txt", "rb") as f:
text = load(...)
input_ids = tokenizer.encode(text, return_tensors='pt').to(device)
I suspect this may come from slight changes in the tokenizer
This model will also have a new repository created for it, used for adaptation
@cyrilvallez Hi Cyril, I re-test the hf native version, as you suggested. And the error remains. The tokenizer seems to behave consistently, so I have no idea where is the bug: https://huggingface.co/THUDM/glm-4-9b-chat-1m-hf/discussions/1.
You can also find the test example I used in the above link.