Spaces:
Sleeping
Sleeping
tier,model,FactBench,Reddit,Overall | |
F1,GPT4o,80.93,42.76,67.41 | |
F1,Claude 3.5-Sonnet,75.68,42.90,63.65 | |
F1,Gemini 1.5-Flash,77.38,40.26,64.10 | |
F1,Llama3.1-8b,60.71,28.86,48.62 | |
F1,Llama3.1-70b,65.83,38.61,55.12 | |
F1,Llama3.1-405B,73.23,38.98,60.61 | |
F1,Qwen2.5-8b,69.23,37.25,55.78 | |
F1,Qwen2.5-32b,71.31,37.34,60.00 | |
Recall,GPT4o,77.13,30.06,57.93 | |
Recall,Claude 3.5-Sonnet,69.35,30.69,53.58 | |
Recall,Gemini 1.5-Flash,70.71,27.67,53.16 | |
Recall,Llama3.1-8b,54.28,20.39,40.46 | |
Recall,Llama3.1-70b,58.00,29.31,46.30 | |
Recall,Llama3.1-405B,68.40,28.00,51.92 | |
Recall,Qwen2.5-8b,58.66,26.01,45.34 | |
Recall,Qwen2.5-32b,62.77,25.38,47.52 | |
Precision,GPT4o,85.11,74.04,80.59 | |
Precision,Claude 3.5-Sonnet,83.28,71.25,78.37 | |
Precision,Gemini 1.5-Flash,85.45,73.87,80.72 | |
Precision,Llama3.1-8b,68.87,49.36,60.91 | |
Precision,Llama3.1-70b,76.05,56.54,68.09 | |
Precision,Llama3.1-405B,78.80,64.10,72.80 | |
Precision,Qwen2.5-8b,77.18,65.58,72.45 | |
Precision,Qwen2.5-32b,82.74,70.60,77.79 |