Some other quantizations
#1
by
localAGI
- opened
Hey, any chance you add a fp16 variant of the model?
Does it make any difference in executing?
I am running on GPU. Afaik fp16 model would be around 28G, so should do nicely with 80-90% offloading to a 24GVram card.
Might be able to do it.
Just not sure, if a partial offloading is supported with Ctranslate2, and I am also not sure for which reason you would want to load in fp16. fp16 would be 32GB also