This Space applies [item response theory](https://en.wikipedia.org/wiki/Item_response_theory) (2PL) to results of the Hugging Face [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Seperate models were fit for each [evaluation framework](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks) covered in the leaderboard; each top level tab corresponds to one. Within each tab sub-tabs corresponding to individual parameters from the model. Each tab presents a table of results: * For item related parameters, results are over questions presented to the language models. For brevity, questions are listed using their hash. Details of the question can be found by clicking the row of interest. * The person related parameter is over language models. This tab supports comparison between models based on their _ability_. See the interface below the table for details. Code that produced the results in this Space can be found on [Github](https://github.com/jerome-white/leaderboard-item-response), including the [Stan model](https://github.com/jerome-white/leaderboard-item-response/blob/1334a1bf4cd9b04333bb1726c78bae0c03eec00b/src/model/model.stan) that drove sampling.