Won't fit on 48GB VRAM
#1
by
Adzeiros
- opened
Hey, curious as using your config for this it won't load on 48GB of VRAM... I run the server with the integrated GPU (CPU) to render everything so both 3090's are idling at 0 VRAM usage...
However, even lowering the chunk_size to 512 won't let it load and I go OOM.
Looks like I'll need to lower the context from 32k to like 28k or so, but will keep testing to find the max... Curious if you find the same? I am running it on the latest version of TabbyAPI
-EDIT- Okay, I got it to load with 16k context... Will test it out but might prefer a 4.85bpw to get 32k context