|
This Space applies [item response |
|
theory](https://en.wikipedia.org/wiki/Item_response_theory) (2PL) to |
|
results of the Hugging Face [OpenLLM |
|
Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Seperate |
|
models were fit for each [evaluation |
|
framework](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks) |
|
covered in the leaderboard; each top level tab corresponds to |
|
one. Within each tab sub-tabs corresponding to individual parameters |
|
from the model. Each tab presents a table of results: |
|
|
|
* For item related parameters, results are over questions presented to |
|
the language models. For brevity, questions are listed using their |
|
hash. Details of the question can be found by clicking the row of |
|
interest. |
|
* The person related parameter is over language models. This tab |
|
supports comparison between models based on their _ability_. See the |
|
interface below the table for details. |
|
|
|
Code that produced the results in this Space can be found on |
|
[Github](https://github.com/jerome-white/leaderboard-item-response), |
|
including the [Stan |
|
model](https://github.com/jerome-white/leaderboard-item-response/blob/1334a1bf4cd9b04333bb1726c78bae0c03eec00b/src/model/model.stan) |
|
that drove sampling. |
|
|