fp16?
#1
by
son-of-man
- opened
It seems like there might be a more significant difference between quants (even Q8) and fp16 on llama3 than there was on llama2 or mistral.
Since it's such a small model, running an fp16 gguf isn't too hard on consumer hardware, so it seems like a worthwhile tradeoff for the increased nuance and coherence it might have.