Feedback

#2
by KeyboardMasher - opened

Here are my observations after trying this model in Q8_0 quantization and greedy decoding and comparing it to Llama 3.1 Instruct in the same quantization, sampling parameters and system prompt:

  1. Does not handle false premise questions well. Unlike L3.1 it does not correct the user, but makes up wrong justification.
    Example - "Why do numbers in Slitherlink puzzle can go only up to 2?" (They can go up to 3).
  2. Hallucinates about obscure real world facts noticeably more than L3.1
    Example - question it about small towns around the world and compare the answers to Wikipedia entries.

Thanks @KeyboardMasher -- this is largely our experience too, but need to learn more about training more models for nice quantization.

natolambert changed discussion status to closed

Sign up or log in to comment