beratcmn/Mistral-7B-v0.1-int8 · Please share working code sample and exact versions of required packages.

Receiving error:

ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Installed packages:

transformers 4.37.1
bitsandbytes-0.42.0
scipy-1.12.0
accelerate 0.26.1

Code:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda:3" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("Mistral-7B-v0.1-int8") 
tokenizer = AutoTokenizer.from_pretrained("Mistral-7B-v0.1-int8")


messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])