Not able to load via transformers

#16
by balu548411 - opened

Hi bro, I am newbie to qlora, I tried below code and it raises OSError. Can you tell me how to load and use this using python.
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")

model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")


OSError Traceback (most recent call last)
in <cell line: 5>()
3 tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")
4
----> 5 model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2553 )
2554 else:
-> 2555 raise EnvironmentError(
2556 f"{pretrained_model_name_or_path} does not appear to have a file named"
2557 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"

OSError: TheBloke/guanaco-65B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

You can't load GPTQ models from regular transformers, you need AutoGPTQ

pip install auto-gptq

Here is example code:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit.act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

prompt = "Tell me about AI"
prompt_template=f'''### Instruction: {prompt}
### Response:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Thank you bro 😊😊

First of all, thanks a lot for your work!
I encounter an issue which is directly caused by following codes:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        device_map="auto",
        trust_remote_code=True,
        device="cuda",
        use_triton=use_triton,
        quantize_config=None)

it first warns me:

WARNING 2023-07-03 22:36:45,587-1d: CUDA extension not installed.
....
WARNING 2023-07-03 22:36:58,012-1d: The safetensors archive passed at /home/mydir/.cache/huggingface/hub/models--TheBloke--guanaco-65B-GPTQ/snapshots/c1a31c76e7228a13bc542b25243b912f12e39c87/Guanaco-65B-GPTQ-4bit.act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.

after a huge amount of information about device_map, it raises the following error:
```

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Access to an undefined portion of a memory object is detected by the operating system.
[TimeInfo: *** Aborted at 1688395081 (unix time) try "date -d @1688395081" if you are using GNU date ***]
[SignalInfo: *** SIGBUS (@0x7fbce9c3dff0) received by PID 424101 (TID 0x7fbea6e7e740) from PID 18446744073336512496 ***]


![image.png](https://cdn-uploads.huggingface.co/production/uploads/6033ae93b5883695ce9d0918/LsYSFyg909xJjyhnCxeCj.png)

I pretty sure that I have my cudatoolkit installted, do you have any clue about the problerm?
Again, thanks for your work and hope for your reply.

Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?

If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq

Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.

Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?

If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq

Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.

Thank you so much

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment