farimafatahi commited on
Commit
1f646d8
Β·
verified Β·
1 Parent(s): 17d6ec9

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +4 -4
app.py CHANGED
@@ -171,18 +171,18 @@ with tab1:
171
  # st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
172
  st.markdown('<div class="tab-content">', unsafe_allow_html=True)
173
 
174
- st.markdown('Metrics Explanation')
175
- st.markdown( '''
176
  <div class="metric">
177
  <br/>
178
  <p style="font-size:16px;">
179
- <strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> measures the degree of incorrect or inconclusive content units in model response, with details provided in the paper. We also provide statistics on the average number of unsupported unit (<strong>Avg. Unsupported</strong>), average number of units labelled as undecided (<strong>Avg. Undecided</strong>), Average length of response in terms of the number of tokens, and the average verifiable units existing in the model responses.
180
  </p>
181
  <p style="font-size:16px;">
182
  πŸ”’ for closed LLMs; πŸ”‘ for open-weights LLMs; 🚨 for newly added models"
183
  </p>
184
  </div>
185
- ''',
186
  unsafe_allow_html=True
187
  )
188
 
 
171
  # st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
172
  st.markdown('<div class="tab-content">', unsafe_allow_html=True)
173
 
174
+ st.markdown('# Metrics Explanation')
175
+ st.markdown("""
176
  <div class="metric">
177
  <br/>
178
  <p style="font-size:16px;">
179
+ <strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), the average length of the response in terms of the number of tokens, and the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>).
180
  </p>
181
  <p style="font-size:16px;">
182
  πŸ”’ for closed LLMs; πŸ”‘ for open-weights LLMs; 🚨 for newly added models"
183
  </p>
184
  </div>
185
+ """,
186
  unsafe_allow_html=True
187
  )
188