Spaces:

jerome-white
/

leaderboard-item-response

Sleeping

Renamed Github repository

dbed50b about 2 months ago

1.25 kB

	This Space applies [item response
	theory](https://en.wikipedia.org/wiki/Item_response_theory) (2PL) to
	results of the Hugging Face [OpenLLM
	Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). Seperate
	models were fit for each [evaluation
	framework](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#tasks)
	covered in the leaderboard; each top level tab corresponds to
	one. Within each tab sub-tabs corresponding to individual parameters
	from the model. Each tab presents a table of results:

	* For item related parameters, results are over questions presented to
	the language models. For brevity, questions are listed using their
	hash. Details of the question can be found by clicking the row of
	interest.
	* The person related parameter is over language models. This tab
	supports comparison between models based on their _ability_. See the
	interface below the table for details.

	Code that produced the results in this Space can be found on
	[Github](https://github.com/jerome-white/leaderboard-item-response),
	including the [Stan
	model](https://github.com/jerome-white/leaderboard-item-response/blob/1334a1bf4cd9b04333bb1726c78bae0c03eec00b/src/model/model.stan)
	that drove sampling.