Spaces:
Running
Running
Update verifact_data.csv
Browse files- verifact_data.csv +25 -25
verifact_data.csv
CHANGED
@@ -1,25 +1,25 @@
|
|
1 |
-
tier,model,
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
|
|
1 |
+
tier,model,FactBench,Reddit,Overall
|
2 |
+
F1,GPT4o,80.93,42.76,67.41
|
3 |
+
F1,Claude 3.5-Sonnet,75.68,42.90,63.65
|
4 |
+
F1,Gemini 1.5-Flash,77.38,40.26,64.10
|
5 |
+
F1,Llama3.1-8b,60.71,28.86,48.62
|
6 |
+
F1,Llama3.1-70b,65.83,38.61,55.12
|
7 |
+
F1,Llama3.1-405B,73.23,38.98,60.61
|
8 |
+
F1,Qwen2.5-8b,69.23,37.25,55.78
|
9 |
+
F1,Qwen2.5-32b,71.31,37.34,60.00
|
10 |
+
Recall,GPT4o,77.13,30.06,57.93
|
11 |
+
Recall,Claude 3.5-Sonnet,69.35,30.69,53.58
|
12 |
+
Recall,Gemini 1.5-Flash,70.71,27.67,53.16
|
13 |
+
Recall,Llama3.1-8b,54.28,20.39,40.46
|
14 |
+
Recall,Llama3.1-70b,58.00,29.31,46.30
|
15 |
+
Recall,Llama3.1-405B,68.40,28.00,51.92
|
16 |
+
Recall,Qwen2.5-8b,58.66,26.01,45.34
|
17 |
+
Recall,Qwen2.5-32b,62.77,25.38,47.52
|
18 |
+
Precision,GPT4o,85.11,74.04,80.59
|
19 |
+
Precision,Claude 3.5-Sonnet,83.28,71.25,78.37
|
20 |
+
Precision,Gemini 1.5-Flash,85.45,73.87,80.72
|
21 |
+
Precision,Llama3.1-8b,68.87,49.36,60.91
|
22 |
+
Precision,Llama3.1-70b,76.05,56.54,68.09
|
23 |
+
Precision,Llama3.1-405B,78.80,64.10,72.80
|
24 |
+
Precision,Qwen2.5-8b,77.18,65.58,72.45
|
25 |
+
Precision,Qwen2.5-32b,82.74,70.60,77.79
|