AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
I would use Text Webui with the model it loads, but as soon as I start a chat, I get this error:
Traceback (most recent call last):
File "J:\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 249, in generate_reply_HF
output = shared.model.generate(**generate_params)[0]
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py", line 423, in generate
return self.model.generate(**kwargs)
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1572, in generate
return self.sample(
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2619, in sample
outputs = self(
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 688, in forward
outputs = self.model(
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "J:\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\llama_inference_offload.py", line 135, in forward
if idx <= (self.preload - 1):
File "J:\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Offload_LlamaModel' object has no attribute 'preload'
Output generated in 0.02 seconds (0.00 tokens/s, 0 tokens, context 42, seed 1859993720)
GGML model works fine but slow, so GPTQ version would be better. How can it be improved?
Ok, I've done it. I deleted the quantize_config.json file and it works fine with these parameters:
cpu_memory: 0
auto_devices: false
disk: false
cpu: false
bf16: false
load_in_8bit: false
trust_remote_code: false
load_in_4bit: false
compute_dtype: float16
quant_type: nf4
use_double_quant: false
gptq_for_llama: false
wbits: 4
groupsize: 128
model_type: llama
pre_layer: 0
triton: false
desc_act: false
threads: 0
n_batch: 512
no_mmap: false
mlock: false
n_gpu_layers: 0
n_ctx: 2048
llama_cpp_seed: 0.0
gpu_memory_0: 0
OK, that's odd. No idea why deleting the quantize_config.json would help, as the params you've set there are exactly the same as was in the quantize_config.json.
It might have been a text-generation-webui bug, ie some other param you had set was breaking it and either you changed that param at the same time, or else not having quantize_config.json caused it not to break on that invalid param.
Glad it's working now, but FYI quantize_config.json is a file you want and shouldn't ever need to be deleted.