Spaces:

zero-gpu-explorers
/

README

Running

App Files Files Community

168

GPTQ AND AWQ Support for ZeroGPU

#117

by akhil2808 - opened Oct 9, 2024

Discussion

akhil2808

ZeroGPU Explorers org Oct 9, 2024

•

edited Oct 9, 2024

Hey, I was wondering if Zerogpu supports AWQ and GPTQ quantisation considering that they are a dedicated GPU Quantization Type. I tried a lot of different ways to host my Qwen 2VL 72B Instruct AWQ Model but nothing seems to be working. If anyone could lend me a hand on this issue then I would be really thankful

akhil2808 changed discussion status to closed Oct 9, 2024

akhil2808 changed discussion status to open Oct 9, 2024

akhil2808

ZeroGPU Explorers org Oct 9, 2024

John6666

Oct 9, 2024

I'll try to help debug it if I have the code.
However, it is not always possible to fix it since the specifications have changed considerably from the previous Zero GPU space...

akhil2808

ZeroGPU Explorers org Oct 9, 2024

https://huggingface.co/spaces/akhil2808/Qwen2_VL72B_OCR sure here u go.

akhil2808

ZeroGPU Explorers org Oct 9, 2024

Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU

John6666

Oct 9, 2024

•

edited Oct 9, 2024

I committed a version to boot.
However, the inference does not work.

Maybe it would work if the entire AWQ model was small enough to load into CUDA, but when I tried that with the 70B model, it crashed due to lack of VRAM.🤢
A similar algorithm that came out recently managed to work in Zero GPU space. I'm not sure which one it was...

Edit:
I remember now, it was AQLM.
https://discuss.huggingface.co/t/error-running-model-in-zerogpu/109819

xi0v

ZeroGPU Explorers org Oct 9, 2024

Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU

The Format in itself is supported anyways.
what's not supported is loading 72B Vision models on zeroGPU, probably.
Quantized or not.

akhil2808

ZeroGPU Explorers org Oct 9, 2024

@xi0v Oh is there any rule like that? You cant load models which are beyond a certain parameter? because this is less than a 13 billion parameter model which I think is small enough to fit on the 80GB Vram A100 ZeroGPU uses under the hood

akhil2808

ZeroGPU Explorers org Oct 9, 2024

@John6666 I dont think a 13 billion model should be throwing an OOM error on an 80GB A100 thts unlikely

John6666

Oct 9, 2024

I thought the available VRAM was 40 GB? It's 80GB on the GPU specs, though.

xi0v

ZeroGPU Explorers org Oct 9, 2024

•

edited Oct 9, 2024

@xi0v Oh is there any rule like that? You cant load models which are beyond a certain parameter? because this is less than a 13 billion parameter model which I think is small enough to fit on the 80GB Vram A100 ZeroGPU uses under the hood

Well zeroGPU is limited in terms of computational power (hence, it being free but with Qouta) and ZeroGPU uses 40GB A100, not an 80GB if I recall correctly. 13B models work with no problem. What you tried using is a 72B model with Vision capabilities (which makes it need even more computational power to run).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment