garak: A Framework for Security Probing Large Language Models Paper • 2406.11036 • Published Jun 16, 2024
Semantic Consistency for Assuring Reliability of Large Language Models Paper • 2308.09138 • Published Aug 17, 2023 • 2
Representation noising effectively prevents harmful fine-tuning on LLMs Paper • 2405.14577 • Published May 23, 2024 • 1
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 11
Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs Paper • 2010.15285 • Published Oct 28, 2020 • 1