Spaces:

ibm
/

llm-rank-themselves

Running

File size: 780 Bytes

18e32a8

<h1 style='color: purple;'>Synthetic multiple choice </h1>

To analyse our methods, we synthesise data from models with known accuracy in a multiple choice setting, i.e. discrete set of possible responses. Several parameters (number of models, model accuracy, number of prompts, and number of possible answers, noisy comparisons) can have an impact on quality of results. Rankings can be recovered for a range of challenging cases, for instance when the accuracy of underlying models is low or when the evaluation function is noisy and imperfect. When the number of possible answers are low, for example in binary choice settings, recovering rankings becomes challenging. In general low variance in wrong answers cause triplet evaluations to treat wrong answers as the right one.