error model.generate()

#13
by NickyNicky - opened

error images:

image.png

image.png

code:

model_id = "google/gemma-7b-it"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, 
                                             quantization_config=bnb_config, 
                                             device_map={"":0}, 
                                             token=os.environ['HF_TOKEN'])

%%time
chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=250)


# Decode and print the output
text = tokenizer.batch_decode(outputs)[0]
print(text)

Facing the same Issue here

Google org

I just tested locally and it works for me. Would you mind sharing the hardware you are running on? cc @ArthurZ @ybelkada in case they have any ideas.

I'm running it on Colab T4GPU. Some how the gemma-2b-it is running but the 7b-it is throwing the above error

Same issue here with gemma-7b-it:

RuntimeError: shape '[1, 9, 3072]' is invalid for input of size 36864

And somehow, the model runs fine in Kaggle. I can use the gemma-7b-it in Kaggle but throwing the size error in Colab. On the flip side, the gemma-2b-it runs fine in Colab (but I donno how to control the output tokens generated. The generated response is not full but cut off in the middle, for example, for the question "Who are you?", the response I received was "I am a large language model, trained by Google. I am a")

Curiously, the gemma-2b-it model works correctly but the 7b-it and 7b base model does not.

google colab t4, v100 and a100 GPU no work.

same here -- gemma-7b and gemma-7b-it both fail on colab with the same error

image.png

Google org

Thanks all for reporting! I'm managing to reproduce using torch 2.1.0, but the error doesn't appear if I'm using torch 2.2.0.

Is it possible for you to share your torch version/upgrade it to 2.2.0 if not already the case and let us know if it helps?

Google Colab

version error:
image.png

Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager as sdpa isn't supported in torch in these versions. We will fix the models to work with these versions in transformers ASAP and release a patch; but in the meantime, we recommend using a torch version that satisfies torch>=2.1.1 in order to leverage the sdpa attention implementation, which works correctly.

Here is the necessary line to install the relevant pytorch version in colab:

pip install "torch>=2.1.1" -U

Please restart your runtime afterwards for it to leverage the updated pytorch version!

work 2.2.0

image.png

# with this line of code it automatically updates torch to 2.2.0+cu121

!pip install torchaudio==2.2.0
https://huggingface.co/google/gemma-7b/discussions/17
NickyNicky changed discussion status to closed

Hey all! There's a PR to fix the "eager" attention in Transformers: https://github.com/huggingface/transformers/pull/29187. Once this is merged, we'll do a patch release and bump the latest PyPi version of Transformers to include this fix

cc @ArthurZ

Google org

Patch release is done! Thanks all for the prompt report, and sorry for not catching ! pip install -U transformers

Sign up or log in to comment