Error when using this..
I have CUDA 11.8 and it currently works with ooba textgen with the llama-30b-supercot-4g-128, but when I run this, I get the following error:
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.self_attn.o_proj.qzeros: copying a param with shape torch.Size([1, 832]) from checkpoint, the shape in current model is torch.Size([52, 832]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([1, 6656]) from checkpoint, the shape in current model is
[truncated for visiblity]
I believe it is because my GPTQ version does not match.
How do we determine which GPTQ version to use?
For what it's worth, I initially got this error too and I had to edit my config-user.yaml file to change groupsize
to None
under the VicUnlocked-30B-LoRA-GPTQ
section.
.. that solved my problem, I am just an idiot. Set groupsize to none.