Update app.py
Browse files
app.py
CHANGED
@@ -181,7 +181,7 @@ with tab1:
|
|
181 |
<div class="metric" style="font-size:16px;">
|
182 |
<br/>
|
183 |
<p>
|
184 |
-
<strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average length of the response in terms of the number of tokens, the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), and the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>).
|
185 |
</p>
|
186 |
<p>
|
187 |
π for closed LLMs; π for open-weights LLMs; π¨ for newly added models"
|
|
|
181 |
<div class="metric" style="font-size:16px;">
|
182 |
<br/>
|
183 |
<p>
|
184 |
+
<strong> π― Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> π Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average length of the response in terms of the number of tokens, the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), and the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>).
|
185 |
</p>
|
186 |
<p>
|
187 |
π for closed LLMs; π for open-weights LLMs; π¨ for newly added models"
|