File size: 1,247 Bytes
ad24928
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dbed50b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
This Space applies [item response
theory](https://en.wikipedia.org/wiki/Item_response_theory) (2PL) to
results of the Hugging Face [OpenLLM
Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Seperate
models were fit for each [evaluation
framework](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks)
covered in the leaderboard; each top level tab corresponds to
one. Within each tab sub-tabs corresponding to individual parameters
from the model. Each tab presents a table of results:

* For item related parameters, results are over questions presented to
  the language models. For brevity, questions are listed using their
  hash. Details of the question can be found by clicking the row of
  interest.
* The person related parameter is over language models. This tab
  supports comparison between models based on their _ability_. See the
  interface below the table for details.

Code that produced the results in this Space can be found on
[Github](https://github.com/jerome-white/leaderboard-item-response),
including the [Stan
model](https://github.com/jerome-white/leaderboard-item-response/blob/1334a1bf4cd9b04333bb1726c78bae0c03eec00b/src/model/model.stan)
that drove sampling.