farimafatahi commited on
Commit
35382f2
Β·
verified Β·
1 Parent(s): 1f646d8

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +5 -7
app.py CHANGED
@@ -44,7 +44,7 @@ st.markdown(
44
  color: #555;
45
  }
46
 
47
- .header {
48
  align-items: left;
49
  font-family: 'Arial', sans-serif; /* or use a similar sans-serif font */
50
  margin-bottom: 20px;
@@ -173,12 +173,12 @@ with tab1:
173
 
174
  st.markdown('# Metrics Explanation')
175
  st.markdown("""
176
- <div class="metric">
177
  <br/>
178
- <p style="font-size:16px;">
179
- <strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), the average length of the response in terms of the number of tokens, and the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>).
180
  </p>
181
- <p style="font-size:16px;">
182
  πŸ”’ for closed LLMs; πŸ”‘ for open-weights LLMs; 🚨 for newly added models"
183
  </p>
184
  </div>
@@ -186,8 +186,6 @@ with tab1:
186
  unsafe_allow_html=True
187
  )
188
 
189
- st.markdown('@Farima populate here')
190
-
191
  st.markdown("""
192
  <style>
193
  /* Selectbox text */
 
44
  color: #555;
45
  }
46
 
47
+ .header, .metric {
48
  align-items: left;
49
  font-family: 'Arial', sans-serif; /* or use a similar sans-serif font */
50
  margin-bottom: 20px;
 
173
 
174
  st.markdown('# Metrics Explanation')
175
  st.markdown("""
176
+ <div class="metric" style="font-size:16px;">
177
  <br/>
178
+ <p>
179
+ <strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average length of the response in terms of the number of tokens, the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), and the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>).
180
  </p>
181
+ <p>
182
  πŸ”’ for closed LLMs; πŸ”‘ for open-weights LLMs; 🚨 for newly added models"
183
  </p>
184
  </div>
 
186
  unsafe_allow_html=True
187
  )
188
 
 
 
189
  st.markdown("""
190
  <style>
191
  /* Selectbox text */