SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models Paper • 2402.05044 • Published Feb 7 • 2
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26 • 35