Spaces:

launch
/

ExpertLongBench

Running

JieRuan commited on Jun 10

Commit

840ff89

verified ·

1 Parent(s): e78a002

Update src/streamlit_app.py

Files changed (1) hide show

src/streamlit_app.py CHANGED Viewed

@@ -63,7 +63,7 @@ def load_data(path):
 # one page description
-st.markdown("## Leaderboard")
 # st.markdown("**Leaderboard:** higher scores shaded green; best models bolded.")
 tiers = ['F1', 'Accuracy']
@@ -142,7 +142,7 @@ pipeline_image = Image.open("src/pipeline.png")
 buffered2 = BytesIO()
 pipeline_image.save(buffered2, format="PNG")
 img_data_pipeline = base64.b64encode(buffered2.getvalue()).decode("utf-8")
-st.markdown("## Abstract")
 st.write(
 """
 The paper introduces ExpertLongBench, an expert-level benchmark containing 11 tasks from 9 domains that reflect realistic expert workflows and applications.
@@ -159,7 +159,7 @@ We benchmark 11 large language models (LLMs) and analyze components in CLEAR, sh
 )
-st.markdown("## Pipeline")
 st.markdown(
 f"""
 <div class="logo-container" style="display:flex; justify-content: center;">

 # one page description
+st.markdown("## 🏆 Leaderboard")
 # st.markdown("**Leaderboard:** higher scores shaded green; best models bolded.")
 tiers = ['F1', 'Accuracy']
 buffered2 = BytesIO()
 pipeline_image.save(buffered2, format="PNG")
 img_data_pipeline = base64.b64encode(buffered2.getvalue()).decode("utf-8")
+st.markdown("## 🧠 Abstract")
 st.write(
 """
 The paper introduces ExpertLongBench, an expert-level benchmark containing 11 tasks from 9 domains that reflect realistic expert workflows and applications.
 )
+st.markdown("## 🧰 Evaluation Pipeline")
 st.markdown(
 f"""
 <div class="logo-container" style="display:flex; justify-content: center;">