try to load model using AutoModelForCausalLM.from_pretrain. When load in 8bits or 4bits are on, loading is very slow and generate function hangs. But with bf16 it is just fine, any idea ?
Your need to confirm your account before you can post a new comment.