Spaces:

launch
/

factbench

Running

farimafatahi commited on Oct 27, 2024

Commit

0eb74e9

verified ·

1 Parent(s): 6f97145

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -181,7 +181,7 @@ with tab1:
     <div class="metric" style="font-size:16px;">
         <br/>
         <p>
-        <strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average length of the response in terms of the number of tokens, the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), and the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>).
         </p>
         <p>
         🔒 for closed LLMs; 🔑 for open-weights LLMs; 🚨 for newly added models"

     <div class="metric" style="font-size:16px;">
         <br/>
         <p>
+        <strong> 🎯 Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> 🌀 Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average length of the response in terms of the number of tokens, the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), and the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>).
         </p>
         <p>
         🔒 for closed LLMs; 🔑 for open-weights LLMs; 🚨 for newly added models"