Q6_K vs Q6_K_L

by mahdisml - opened 1 day ago

1 day ago

i couldn't find good information about "Q6_K vs Q6_K_L" (speed and accuracy and ...)
all articles and discussions are about "IQ4 vs Q4 vs Q5 vs Q6" 😞

bartowski

Owner 1 day ago

The only difference is that Q6_K_L uses Q8_0 for embed/output weights instead of Q6_K, so it could have slightly higher performance

In practice it's hard to find much difference, especially for the larger weights. Q2_K vs Q2_K_L is where the only major difference is seen.

SerialKicked

1 day ago

Frankly, Q6 is already so close to Q8 that the + _L is pretty much a waste of VRAM and compute time, it's better spent on context length.

bartowski

Owner 1 day ago

yeah i mostly agree, I don't necessarily recommend them over the regular ones, but some people do, and I'm nothing if not a people pleaser...

mahdisml

1 day ago

The only difference is that Q6_K_L uses Q8_0 for embed/output weights instead of Q6_K, so it could have slightly higher performance

In practice it's hard to find much difference, especially for the larger weights. Q2_K vs Q2_K_L is where the only major difference is seen.

Thank you so much ! 🙏🙏

webboty

about 3 hours ago

Interesting ... what would you say is the better approach to go ... a Q5_K_L or a Q6_K ... when it comes to a good ratio for speed/quality and a great context window?

bartowski

Owner about 2 hours ago

I'd probably just use Q6_K for the improved quality 🤗

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment