TLDR

#1
by IZA09 - opened

i dont mean to be lazy, but would it be possible for you to simply list exactly what "feels" different between all of these? you say they each have there own quirks, care to elaborate?

Owner

I thought of doing this ; and then the issue came up of use cases.
I could say model "X" performs under "y" conditions, but this model under these conditions... and so on.
Actually would have preferred this, but it is an impossible task because of the variability of LLMs.

To address this each model in this series , uses the same "examples" - so comparing these and the differences between them is the best way to see how each
handles a prompt and it's output. (temp and all parameters are the same per test/per model)

Changing one word, or adding a comma (in a prompt) can radically change the output ... and reveal or hide a model's characteristics.

Once you intro "temp" (and other parameters) into the equation, output changes drastically.

Likewise when I release a model (other that this series) I have between 5 and 20 versions of the model; and select the best from these after testing.
It is never an easy choice.

Eqbench, does 10 runs per prompt, multiple prompts per model then analytics.

Sign up or log in to comment