A newer version of the Gradio SDK is available:
5.22.0
This Space applies item response theory (2PL) to results of the Hugging Face OpenLLM Leaderboard. Seperate models were fit for each evaluation framework covered in the leaderboard; each top level tab corresponds to one. Within each tab sub-tabs corresponding to individual parameters from the model. Each tab presents a table of results:
- For item related parameters, results are over questions presented to the language models. For brevity, questions are listed using their hash. Details of the question can be found by clicking the row of interest.
- The person related parameter is over language models. This tab supports comparison between models based on their ability. See the interface below the table for details.
Code that produced the results in this Space can be found on Github, including the Stan model that drove sampling.