RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

#14
by HassanStar - opened

Which version of pytorch should use in this case?

Are you using it on a CPU or GPU ?

Did you push the model to GPU before running?

I have the same problem, any idea to deal with?

These are fp16 weights when running on CPU it's giving this error, When I ran it on Colab Pro V100 GPU, it works.
Screenshot 2023-12-15 at 10.21.09 AM.png

@HassanStar I got the same error when running on Torch version 2.1.2 on a Mac if I tried to put the model on the CPU, but if I use torch.set_default_device("mps") to use the Metal acceleration it works just fine.

Hello everyone!

CPU with FP16 does not work since there is no CPU-FP16 LayerNormalization kernel implementation on PyTorch.

Best regards,
Gustavo.

gugarosa changed discussion status to closed
gugarosa changed discussion status to open
gugarosa changed discussion status to closed

Did you push the model to GPU before running?

how do i do this?

Hi @andreariboni , if you have a Nvidia gpu then you can do model.to("cuda") or if you are working on apple silicon then do model.to("mps"). BTW don't forget to do the same to the inputs.

Hii All,
For cpu you can use this code.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cpu")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
"""
Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

image.png

Sign up or log in to comment