GGUF
Not-For-All-Audiences
nsfw
Inference Endpoints

fp16?

#1
by son-of-man - opened

It seems like there might be a more significant difference between quants (even Q8) and fp16 on llama3 than there was on llama2 or mistral.
Since it's such a small model, running an fp16 gguf isn't too hard on consumer hardware, so it seems like a worthwhile tradeoff for the increased nuance and coherence it might have.

Sign up or log in to comment