kaitchup/Llama-3.3-70B-Instruct-AutoRound-GPTQ-4bit · Your quants are not listed in the base model

21 days ago

Hey Kaitchup,

First I want to thank you for the work you do.
I have 2xAMD MI100 and your quants are the fastest (which work) on that device. Especially in prompt processing.
For Example for your quantization of the Llama-3.3-70B-Instruct in vllm I get 391 t/s of prompt eval and 19 t/s generation.
The other closest quant gives me only 320 t/s of the prompt eval.
Big + is the the fact that you do not use bf16 which is not supported on AMD MI60 and MI100

Back to the topic.
I'm looking for all quants going through the base model card in Hugginface and then clicking on "Quaternizations"
And your quants are not in the list.
There is another account which uses exactly the same names for quants. But he is using BF16 and they always fail on the MI100.
Your quants were recommended on Reddit without providing full link and i was puzzled when originally they didn't work.
I could find your version only using Google.

bnjmnmarie

The Kaitchup org 21 days ago

Hi,

I don't know how this is listed by Hugging Face. I thought this was done automatically but if my model is not there, something must misconfigured.
Thank you for letting me know.

cicdatopea

15 days ago

when you change the readme, there is a base model you could fill.