safearena-leaderboard / results.csv
adadtur's picture
Update results.csv
d3f7085 verified
raw
history blame contribute delete
600 Bytes
Model,Safe Completion Rate,Harmful Completion Rate,Refusal Rate,Normalized Safety Score,License,Bias Completion Rate,Cybercrime Completion Rate,Harassment Completion Rate,Misinformation Completion Rate,Illegal Activity Completion Rate
GPT-4o,34.4,22.8,30.2,31.7,Proprietary,14.0,16.0,16.0,28.0,40.0
GPT-4o-Mini,18.4,14.0,36.5,35.7,Proprietary,6.0,8.0,14.0,24.0,18.0
Claude-3.5-Sonnet-202406,21.2,7.6,57.7,55.0,Proprietary,4.0,6.0,5.0,12.0,12.0
llama-3.2-90b-Vision-Instruct,8.4,11.2,14.0,34.0,Llama License,22.0,8.0,10.0,14.0,2.0
Qwen-2-VL-72B,24.4,26.0,0.8,21.5,Qwen License,34.0,18.0,18.0,30.0,30.0