Unable to load: size mismatches
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.self_attn.k_proj.qzeros: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([32, 512]).
And a whole bunch more. Might be the model or might be my code (although loading the quantized 7B with 128 groupsize seems to work fine). But since I'm still learning, I can't say for absolutely sure.
Sounds like it's trying to load an 128 group size model? This one is 32g.
How are you loading this? An UI or using the GPTQ library by itself? Try renaming the model file to "4bit-32g.safetensors" and trying again.
And you are absolutely right, of course. I use a modified version of the _load_quant function from oobabooga and while I did set the groupsize to 32 in my startup parameters, it seems that somewhere along the way they don't get passed. Sorry about that, completely my fault.