Update app.py
Browse files
app.py
CHANGED
@@ -171,7 +171,21 @@ with tab1:
|
|
171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
173 |
|
174 |
-
st.markdown('
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
st.markdown('@Farima populate here')
|
176 |
|
177 |
st.markdown("""
|
|
|
171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
173 |
|
174 |
+
st.markdown('Metrics Explanation')
|
175 |
+
st.markdown( '''
|
176 |
+
<div class="metric">
|
177 |
+
<br/>
|
178 |
+
<p style="font-size:16px;">
|
179 |
+
<strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> measures the degree of incorrect or inconclusive content units in model response, with details provided in the paper. We also provide statistics on the average number of unsupported unit (<strong>Avg. Unsupported</strong>), average number of units labelled as undecided (<strong>Avg. Undecided</strong>), Average length of response in terms of the number of tokens, and the average verifiable units existing in the model responses.
|
180 |
+
</p>
|
181 |
+
<p style="font-size:16px;">
|
182 |
+
π for closed LLMs; π for open-weights LLMs; π¨ for newly added models"
|
183 |
+
</p>
|
184 |
+
</div>
|
185 |
+
''',
|
186 |
+
unsafe_allow_html=True
|
187 |
+
)
|
188 |
+
|
189 |
st.markdown('@Farima populate here')
|
190 |
|
191 |
st.markdown("""
|