Won't fit on 48GB VRAM

#1
by Adzeiros - opened

Hey, curious as using your config for this it won't load on 48GB of VRAM... I run the server with the integrated GPU (CPU) to render everything so both 3090's are idling at 0 VRAM usage...

However, even lowering the chunk_size to 512 won't let it load and I go OOM.

Looks like I'll need to lower the context from 32k to like 28k or so, but will keep testing to find the max... Curious if you find the same? I am running it on the latest version of TabbyAPI

-EDIT- Okay, I got it to load with 16k context... Will test it out but might prefer a 4.85bpw to get 32k context

Sign up or log in to comment