Spaces:
Sleeping
Sleeping
Update app.py
Browse filesadd arxiv link
app.py
CHANGED
@@ -36,7 +36,7 @@ st.markdown(
|
|
36 |
)
|
37 |
|
38 |
st.markdown(
|
39 |
-
"We are excited to share the BenchBench-Leaderboard, a crucial component of our comprehensive research on Benchmark Agreement Testing (BAT) [work](
|
40 |
"This leaderboard is a meta-benchmark that ranks benchmarks based on their agreement with the crowd harnessing many different references. "
|
41 |
)
|
42 |
|
|
|
36 |
)
|
37 |
|
38 |
st.markdown(
|
39 |
+
"We are excited to share the BenchBench-Leaderboard, a crucial component of our comprehensive research on Benchmark Agreement Testing (BAT) [work](https://arxiv.org/abs/2407.13696) -- Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation."
|
40 |
"This leaderboard is a meta-benchmark that ranks benchmarks based on their agreement with the crowd harnessing many different references. "
|
41 |
)
|
42 |
|