|
model name, source, v1 metric, v2 metric
|
|
OpenAI: GPT-4.5 (Preview),Proprietary Model,100%,97%
|
|
OpenAI: o3 Mini High,Proprietary Model,100%,96%
|
|
OpenAI: o3 Mini,Proprietary Model,100%,96%
|
|
OpenAI: GPT-4o,Proprietary Model,99.09%,95%
|
|
OpenAI: GPT-4o-mini,Proprietary Model,99.09%,97%
|
|
Anthropic: Claude 3.5 Sonnet,Proprietary Model,99.09%,97%
|
|
Anthropic: Claude 3.5 Haiku,Proprietary Model,100%,97%
|
|
Anthropic: Claude 3.7 Sonnet,Proprietary Model,99.09%,98%
|
|
Google: Gemma 3 27B ,Open Source,98.18%,95%
|
|
Google: Gemini Flash 2.0,Proprietary Model,100%,99%
|
|
Google: Gemini 2.0 Flash Lite,Proprietary Model,100%,97%
|
|
DeepSeek: R1,Open Source,100%,98%
|
|
DeepSeek: DeepSeek V3,Open Source,100%,97%
|
|
Mistral: Mistral Small 3.1 24B,Open Source,100%,97%
|
|
Mistral: Mistral Small 3,Open Source,99.09%,97%
|
|
Mistral Large 2411,Open Source,99.09%,96%
|
|
Meta: Llama 3.3 70B Instruct,Open Source,100%,97%
|
|
Meta: Llama 3.2 3B Instruct,Open Source,78.18%,75%
|
|
Qwen: QwQ 32B,Open Source,100.00%,96%
|
|
Microsoft: Phi 4,Proprietary Model,100%,97%
|
|
Microsoft: Phi-3.5 Mini 128K Instruct,Open Source,99.09%,97%
|
|
Microsoft: Phi-3 Mini 128K Instruct,Open Source,98.18%,98%
|
|
|
|
|