Can I use this model in text-generation-webui?
title
ive tried yesterday, it has its own inference mode and the model wont get recognized..
i couldnt make it work on colab maybe someone else can. maybe tomorrow i can try a local install..
but it seems you need santacoder_inference to make it work.. i dont think ooba supports it yet
python -m santacoder_inference bigcode/starcoder --wbits 4 --load starcoder-GPTQ-4bit-128g/model.pt
hey
@CyberTimon
it should work.
I tried the inference with santacoder using this command a few days ago and it was working.
I haven't tried starcoder though but I don't see a reason why it shouldn't work.
The webui won't work though.
I also cannot figure out how to make this thing work.. even running python -m santacoder_inference bigcode/starcoder --wbits 4 --load ../models/starcoder-GPTQ-4bit-128g/model.pt just tries to download the model files again.
I already downloaded starcoder-GPTQ-4bit-128g/model.pt ..
Can anyone point me in the right direction? If I can figure it out, ill write a guide on what I did.
-edit- I used GPT 4 to help me rearrange everything until I could run it. Main issue seems to be the config.json file.. I keep getting the errors below (I didnt paste them all); GPT4 says its because my config.json is from the original starcoder and not from the gptq 4bit 128 version. Still looking into how to solve this issue. Maybe I'm just an idiot? Will confirm soon
for-SantaCoder$ python santacoder_inference.py bigcode/starcoder --wbits 4 --load /mnt/i/ai/text-generation-webui/models/starcoder-GPTQ-4bit-128g/model.pt
Traceback (most recent call last):
File "/mnt/i/ai/text-generation-webui/repositories/GPTQ-for-SantaCoder/santacoder_inference.py", line 114, in
main()
File "/mnt/i/ai/text-generation-webui/repositories/GPTQ-for-SantaCoder/santacoder_inference.py", line 104, in main
model = get_santacoder(args.model, args.load, args.wbits)
File "/mnt/i/ai/text-generation-webui/repositories/GPTQ-for-SantaCoder/santacoder_inference.py", line 58, in get_santacoder
model.load_state_dict(state_dict_original)
File "/home/kcramp/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2056, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPTBigCodeForCausalLM:
Unexpected key(s) in state_dict: "transformer.h.0.attn.c_attn.zeros", "transformer.h.0.attn.c_proj.zeros", "transformer.h.0.mlp.c_fc.zeros", "transformer.h.0.mlp.c_proj.zeros", "transformer.h.1.attn.c_attn.zeros", "transformer.h.1.attn.c_proj.zeros", "transformer.h.1.mlp.c_fc.zeros", "transformer.h.1.mlp.c_proj.zeros", "transformer.h.2.attn.c_attn.zeros", "transformer.h.2.attn.c_proj.zeros", "transformer.h.2.mlp.c_fc.zeros", "transformer.h.2.mlp.c_proj.zeros", "transformer.h.3.attn.c_attn.zeros", "transformer.h.3.attn.c_proj.zeros", "transformer.h.3.mlp.c_fc.zeros", "transformer.h.3.mlp.c_proj.zeros", "transformer.h.4.attn.c_attn.zeros", "transformer.h.4.attn.c_proj.zeros", "transformer.h.4.mlp.c_fc.zeros", "transformer.h.4.mlp.c_proj.zeros", " size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([768, 6400]) from checkpoint, the shape in current model is torch.Size([6400, 6144]).
size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([768, 6144]) from checkpoint, the shape in current model is torch.Size([6144, 6144]).
size mismatch for transformer.h.0.mlp.c_fc.weight: copying a param with shape torch.Size([768, 24576]) from checkpoint, the shape in current model is torch.Size([24576, 6144]).
size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 6144]) from checkpoint, the shape in current model is torch.Size([6144, 24576]).
size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([768, 6400]) from checkpoint, the shape in current model is torch.Size([6400, 6144]).
size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([768, 6144]) from checkpoint, the shape in current model is torch.Size([6144, 6144]).
size mismatch for transformer.h.1.mlp.c_fc.weight: copying a param with shape torch.Size([768, 24576]) from checkpoint, the shape in current model is torch.Size([24576, 6144]).
size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([3072, 6144]) from checkpoint, the shape in current model is torch.Size([6144, 24576]).
@kcramp858
yeah, I think that will download the model files from the original repo for the model again.
Let me outline how this works:
The model is loaded in fp16 and then we inject the int8/int4 weights into the model.
Regarding, the error you are seeing, I am unsure. I will investigate.
Thanks for trying it out :)
Hey guys, sorry.
I have fixed the bug.
Context: I was debugging something and had accidentally hardcoded groupsize to -1
.
Can you try specifying --groupsize 128
for starcoder during inference. I just tried and it worked for me :)
Please note for santacoder, you should specify -1.
Fixed in the latest commit: https://github.com/mayank31398/GPTQ-for-SantaCoder/commit/40df38b03e4ebdaf9e5a444e9f7b4b6df79cff39
Please pull the changes :)
can we close this?
Sure, but it still doesn't work in oobabooga but I don't think you can easily change this as this has to do with the architecture of the model
@CyberTimon I've tried this model: https://huggingface.co/GeorgiaTechResearchInstitute/starcoder-gpteacher-code-instruct
It works with oobabooga but it's huge and slow.
yeah, you will need to quantize that model yourself.
You can take a look at the scripts provided in my repo.
Would this model run on oobabooga if quantized? I've not done anything like this yet. How long would this take and would it be possible with a normal PC and 3090 GPU?
I am not sure.
I am planning to add a quantized version of starchat by this week too.