shezamunir commited on
Commit
39d0cfd
·
verified ·
1 Parent(s): 896c8b2

Update verifact_data.csv

Browse files
Files changed (1) hide show
  1. verifact_data.csv +25 -25
verifact_data.csv CHANGED
@@ -1,25 +1,25 @@
1
- tier,model,f1,precision,recall
2
- Overall,GPT4o,67.41,80.59,57.93
3
- FactBench,GPT4o,80.93,85.11,77.13
4
- Reddit,GPT4o,42.76,74.04,30.06
5
- Overall,Claude 3.5-Sonnet,63.65,78.37,53.58
6
- FactBench,Claude 3.5-Sonnet,75.68,83.28,69.35
7
- Reddit,Claude 3.5-Sonnet,42.90,71.25,30.69
8
- Overall,Gemini 1.5-Flash,64.10,80.72,53.16
9
- FactBench,Gemini 1.5-Flash,77.38,85.45,70.71
10
- Reddit,Gemini 1.5-Flash,40.26,73.87,27.67
11
- Overall,Llama3.1-8b,48.62,60.91,40.46
12
- FactBench,Llama3.1-8b,60.71,68.87,54.28
13
- Reddit,Llama3.1-8b,28.86,49.36,20.39
14
- Overall,Llama3.1-70b,55.12,68.09,46.30
15
- FactBench,Llama3.1-70b,65.83,76.05,58.00
16
- Reddit,Llama3.1-70b,38.61,56.54,29.31
17
- Overall,Llama3.1-405B,60.61,72.80,51.92
18
- FactBench,Llama3.1-405B,73.23,78.80,68.40
19
- Reddit,Llama3.1-405B,38.98,64.10,28.00
20
- Overall,Qwen2.5-8b,55.78,72.45,45.34
21
- FactBench,Qwen2.5-8b,69.23,77.18,58.66
22
- Reddit,Qwen2.5-8b,37.25,65.58,26.01
23
- Overall,Qwen2.5-32b,60.00,77.79,47.52
24
- FactBench,Qwen2.5-32b,71.31,82.74,62.77
25
- Reddit,Qwen2.5-32b,37.34,70.60,25.38
 
1
+ tier,model,FactBench,Reddit,Overall
2
+ F1,GPT4o,80.93,42.76,67.41
3
+ F1,Claude 3.5-Sonnet,75.68,42.90,63.65
4
+ F1,Gemini 1.5-Flash,77.38,40.26,64.10
5
+ F1,Llama3.1-8b,60.71,28.86,48.62
6
+ F1,Llama3.1-70b,65.83,38.61,55.12
7
+ F1,Llama3.1-405B,73.23,38.98,60.61
8
+ F1,Qwen2.5-8b,69.23,37.25,55.78
9
+ F1,Qwen2.5-32b,71.31,37.34,60.00
10
+ Recall,GPT4o,77.13,30.06,57.93
11
+ Recall,Claude 3.5-Sonnet,69.35,30.69,53.58
12
+ Recall,Gemini 1.5-Flash,70.71,27.67,53.16
13
+ Recall,Llama3.1-8b,54.28,20.39,40.46
14
+ Recall,Llama3.1-70b,58.00,29.31,46.30
15
+ Recall,Llama3.1-405B,68.40,28.00,51.92
16
+ Recall,Qwen2.5-8b,58.66,26.01,45.34
17
+ Recall,Qwen2.5-32b,62.77,25.38,47.52
18
+ Precision,GPT4o,85.11,74.04,80.59
19
+ Precision,Claude 3.5-Sonnet,83.28,71.25,78.37
20
+ Precision,Gemini 1.5-Flash,85.45,73.87,80.72
21
+ Precision,Llama3.1-8b,68.87,49.36,60.91
22
+ Precision,Llama3.1-70b,76.05,56.54,68.09
23
+ Precision,Llama3.1-405B,78.80,64.10,72.80
24
+ Precision,Qwen2.5-8b,77.18,65.58,72.45
25
+ Precision,Qwen2.5-32b,82.74,70.60,77.79