jerome-white's picture
Renamed Github repository
dbed50b
This Space applies [item response
theory](https://en.wikipedia.org/wiki/Item_response_theory) (2PL) to
results of the Hugging Face [OpenLLM
Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Seperate
models were fit for each [evaluation
framework](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks)
covered in the leaderboard; each top level tab corresponds to
one. Within each tab sub-tabs corresponding to individual parameters
from the model. Each tab presents a table of results:
* For item related parameters, results are over questions presented to
the language models. For brevity, questions are listed using their
hash. Details of the question can be found by clicking the row of
interest.
* The person related parameter is over language models. This tab
supports comparison between models based on their _ability_. See the
interface below the table for details.
Code that produced the results in this Space can be found on
[Github](https://github.com/jerome-white/leaderboard-item-response),
including the [Stan
model](https://github.com/jerome-white/leaderboard-item-response/blob/1334a1bf4cd9b04333bb1726c78bae0c03eec00b/src/model/model.stan)
that drove sampling.