tier,model,FactBench,Reddit,Overall F1,GPT4o,80.93,42.76,67.41 F1,Claude 3.5-Sonnet,75.68,42.90,63.65 F1,Gemini 1.5-Flash,77.38,40.26,64.10 F1,Llama3.1-8b,60.71,28.86,48.62 F1,Llama3.1-70b,65.83,38.61,55.12 F1,Llama3.1-405B,73.23,38.98,60.61 F1,Qwen2.5-8b,69.23,37.25,55.78 F1,Qwen2.5-32b,71.31,37.34,60.00 Recall,GPT4o,77.13,30.06,57.93 Recall,Claude 3.5-Sonnet,69.35,30.69,53.58 Recall,Gemini 1.5-Flash,70.71,27.67,53.16 Recall,Llama3.1-8b,54.28,20.39,40.46 Recall,Llama3.1-70b,58.00,29.31,46.30 Recall,Llama3.1-405B,68.40,28.00,51.92 Recall,Qwen2.5-8b,58.66,26.01,45.34 Recall,Qwen2.5-32b,62.77,25.38,47.52 Precision,GPT4o,85.11,74.04,80.59 Precision,Claude 3.5-Sonnet,83.28,71.25,78.37 Precision,Gemini 1.5-Flash,85.45,73.87,80.72 Precision,Llama3.1-8b,68.87,49.36,60.91 Precision,Llama3.1-70b,76.05,56.54,68.09 Precision,Llama3.1-405B,78.80,64.10,72.80 Precision,Qwen2.5-8b,77.18,65.58,72.45 Precision,Qwen2.5-32b,82.74,70.60,77.79