The command outputs = model.generate(inputs) is throwing error "RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'"
#47
by
ayaan-k1
- opened
I have loaded the model in 8 bit so as to save memory:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder",force_download=True)
for fp16 replace with load_in_8bit=True
with torch_dtype=torch.float16
model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder", load_in_8bit=True,device_map={"": 'cuda'})
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")
Any solutions?
Would be better if you also include what happened with the code run. The problem would likely become clearer.
loubnabnl
changed discussion status to
closed