open_llm_leaderboard

Running on CPU Upgrade

are benchmark scores normalised to a baseline?

by Abulaphia - opened Sep 20, 2024

Sep 20, 2024

In the documentation, I see reference to a baseline model for GSM8k. Are the scores for models on the archived leaderboard raw scores, or are they normalised in some way / compared to a standard benchmark? If the latter, is there somewhere I can find details on the methodology?

clefourrier

Open LLM Leaderboard Archive org Nov 15, 2024

Hi! Here they are all raw, we added normalisation in the v2 only :)
The baseline scores (for the row "baseline") were taken from the papers introducing the benchmarks each time.

clefourrier changed discussion status to closed Nov 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment