Feedback
#2
by
KeyboardMasher
- opened
Here are my observations after trying this model in Q8_0 quantization and greedy decoding and comparing it to Llama 3.1 Instruct in the same quantization, sampling parameters and system prompt:
- Does not handle false premise questions well. Unlike L3.1 it does not correct the user, but makes up wrong justification.
Example - "Why do numbers in Slitherlink puzzle can go only up to 2?" (They can go up to 3). - Hallucinates about obscure real world facts noticeably more than L3.1
Example - question it about small towns around the world and compare the answers to Wikipedia entries.
Thanks @KeyboardMasher -- this is largely our experience too, but need to learn more about training more models for nice quantization.
natolambert
changed discussion status to
closed