Performance of quantified models

by danielus - opened Jul 29, 2023

Jul 29, 2023

Is there any way to work out how much 'performance' the quantised versions lose compared to the original? So that you can get an idea of which quantisation level to choose and maximise the ratio of resources used/generation performance
In the github repository of llama.cpp I only found an old post of the different accuracies for the quantisation levels, but I assume it has become obsolete given the speed at which this world is advancing!

jeffwadsworth

Aug 23, 2023

Trust me, the 8 bit is worth it. At least in regards to reasoning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment