allenai/Llama-3.1-Tulu-3-8B

Nov 21, 2024

Here are my observations after trying this model in Q8_0 quantization and greedy decoding and comparing it to Llama 3.1 Instruct in the same quantization, sampling parameters and system prompt:

Does not handle false premise questions well. Unlike L3.1 it does not correct the user, but makes up wrong justification.
Example - "Why do numbers in Slitherlink puzzle can go only up to 2?" (They can go up to 3).
Hallucinates about obscure real world facts noticeably more than L3.1
Example - question it about small towns around the world and compare the answers to Wikipedia entries.

natolambert

Ai2 org Dec 2, 2024

Thanks @KeyboardMasher -- this is largely our experience too, but need to learn more about training more models for nice quantization.

natolambert changed discussion status to closed Dec 2, 2024

allenai
/

Llama-3.1-Tulu-3-8B

Feedback