Spaces:
Sleeping
Sleeping
Yotam-Perlitz
commited on
Commit
•
282d506
1
Parent(s):
f5e30e4
add citations
Browse filesSigned-off-by: Yotam-Perlitz <[email protected]>
app.py
CHANGED
@@ -494,7 +494,8 @@ with st.expander(label="Model scored by the aggragate"):
|
|
494 |
)
|
495 |
|
496 |
with st.expander(label="Citations"):
|
497 |
-
st.code(
|
|
|
498 |
@misc{liu2023agentbenchevaluatingllmsagents,
|
499 |
title={AgentBench: Evaluating LLMs as Agents},
|
500 |
author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
|
@@ -503,7 +504,7 @@ with st.expander(label="Citations"):
|
|
503 |
archivePrefix={arXiv},
|
504 |
primaryClass={cs.AI},
|
505 |
url={https://arxiv.org/abs/2308.03688},
|
506 |
-
}
|
507 |
|
508 |
@software{Li_AlpacaEval_An_Automatic_2023,
|
509 |
author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
|
@@ -755,7 +756,8 @@ with st.expander(label="Citations"):
|
|
755 |
year={2024}
|
756 |
}
|
757 |
|
758 |
-
"""
|
|
|
759 |
|
760 |
st.markdown(
|
761 |
"BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "
|
|
|
494 |
)
|
495 |
|
496 |
with st.expander(label="Citations"):
|
497 |
+
st.code(
|
498 |
+
r"""
|
499 |
@misc{liu2023agentbenchevaluatingllmsagents,
|
500 |
title={AgentBench: Evaluating LLMs as Agents},
|
501 |
author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
|
|
|
504 |
archivePrefix={arXiv},
|
505 |
primaryClass={cs.AI},
|
506 |
url={https://arxiv.org/abs/2308.03688},
|
507 |
+
}
|
508 |
|
509 |
@software{Li_AlpacaEval_An_Automatic_2023,
|
510 |
author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
|
|
|
756 |
year={2024}
|
757 |
}
|
758 |
|
759 |
+
"""
|
760 |
+
)
|
761 |
|
762 |
st.markdown(
|
763 |
"BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "
|