Yotam-Perlitz commited on
Commit
282d506
1 Parent(s): f5e30e4

add citations

Browse files

Signed-off-by: Yotam-Perlitz <[email protected]>

Files changed (1) hide show
  1. app.py +5 -3
app.py CHANGED
@@ -494,7 +494,8 @@ with st.expander(label="Model scored by the aggragate"):
494
  )
495
 
496
  with st.expander(label="Citations"):
497
- st.code("""
 
498
  @misc{liu2023agentbenchevaluatingllmsagents,
499
  title={AgentBench: Evaluating LLMs as Agents},
500
  author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
@@ -503,7 +504,7 @@ with st.expander(label="Citations"):
503
  archivePrefix={arXiv},
504
  primaryClass={cs.AI},
505
  url={https://arxiv.org/abs/2308.03688},
506
- }
507
 
508
  @software{Li_AlpacaEval_An_Automatic_2023,
509
  author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
@@ -755,7 +756,8 @@ with st.expander(label="Citations"):
755
  year={2024}
756
  }
757
 
758
- """)
 
759
 
760
  st.markdown(
761
  "BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "
 
494
  )
495
 
496
  with st.expander(label="Citations"):
497
+ st.code(
498
+ r"""
499
  @misc{liu2023agentbenchevaluatingllmsagents,
500
  title={AgentBench: Evaluating LLMs as Agents},
501
  author={Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang},
 
504
  archivePrefix={arXiv},
505
  primaryClass={cs.AI},
506
  url={https://arxiv.org/abs/2308.03688},
507
+ }
508
 
509
  @software{Li_AlpacaEval_An_Automatic_2023,
510
  author = {Li, Xuechen and Zhang, Tianyi and Dubois, Yann and Taori, Rohan and Gulrajani, Ishaan and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B.},
 
756
  year={2024}
757
  }
758
 
759
+ """
760
+ )
761
 
762
  st.markdown(
763
  "BenchBench-Leaderboard complements our study, where we analyzed over 40 prominent benchmarks and introduced standardized practices to enhance the robustness and validity of benchmark evaluations through the [BenchBench Python package](#). "