EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models
Abstract
Vision-Language Models (VLMs) inherit adversarial vulnerabilities of Large Language Models (LLMs), which are further exacerbated by their multimodal nature. Existing defenses, including adversarial training, input transformations, and heuristic detection, are computationally expensive, architecture-dependent, and fragile against adaptive attacks. We introduce EigenShield, an inference-time defense leveraging Random Matrix Theory to quantify adversarial disruptions in high-dimensional VLM representations. Unlike prior methods that rely on empirical heuristics, EigenShield employs the spiked covariance model to detect structured spectral deviations. Using a Robustness-based Nonconformity Score (RbNS) and quantile-based thresholding, it separates causal eigenvectors, which encode semantic information, from correlational eigenvectors that are susceptible to adversarial artifacts. By projecting embeddings onto the causal subspace, EigenShield filters adversarial noise without modifying model parameters or requiring adversarial training. This architecture-independent, attack-agnostic approach significantly reduces the attack success rate, establishing spectral analysis as a principled alternative to conventional defenses. Our results demonstrate that EigenShield consistently outperforms all existing defenses, including adversarial training, UNIGUARD, and CIDER.
Community
Excited to share our latest work on EigenShield ๐ a novel inference-time defense that enhances the robustness of Vision-Language Models against adversarial attacks. Our approach leverages Random Matrix Theory to filter out adversarial noise by distinguishing causal eigenvectors from correlational ones.
Key Highlights:
๐ Causal Subspace Filtering: Uses the spiked covariance model and a Robustness-based Nonconformity Score to separate meaningful signal from noise.
๐ก๏ธ Inference-Time Defense: Improves model security without the need for retraining, making it architecture-independent and computationally efficient.
๐ Robust Performance: Demonstrated significant reductions in attack success rates and harmful content generation across various VLM architectures.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper