A Provable Energy-Guided Test-Time Defense Boosting Adversarial Robustness of Large Vision-Language Models
Mujtaba Hussain Mirza 1, Antonio D'Orazio 2, Odelia Melamed 1, Iacopo Masi 1
Published on arXiv
2603.26984
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Provides provable test-time defense against adversarial attacks for LVLMs and classifiers without requiring model retraining
ET3 (Energy-Guided Test-Time Transformation)
Novel technique introduced
Despite the rapid progress in multimodal models and Large Visual-Language Models (LVLM), they remain highly susceptible to adversarial perturbations, raising serious concerns about their reliability in real-world use. While adversarial training has become the leading paradigm for building models that are robust to adversarial attacks, Test-Time Transformations (TTT) have emerged as a promising strategy to boost robustness at inference.In light of this, we propose Energy-Guided Test-Time Transformation (ET3), a lightweight, training-free defense that enhances the robustness by minimizing the energy of the input samples.Our method is grounded in a theory that proves our transformation succeeds in classification under reasonable assumptions. We present extensive experiments demonstrating that ET3 provides a strong defense for classifiers, zero-shot classification with CLIP, and also for boosting the robustness of LVLMs in tasks such as Image Captioning and Visual Question Answering. Code is available at github.com/OmnAI-Lab/Energy-Guided-Test-Time-Defense .
Key Contributions
- Energy-Guided Test-Time Transformation (ET3) defense with theoretical proof of classification success under reasonable assumptions
- Training-free method that works across image classification, zero-shot CLIP classification, and LVLM tasks (captioning, VQA)
- Demonstrates superiority over existing test-time defenses across multiple datasets and model architectures
🛡️ Threat Analysis
Defends against adversarial perturbations at inference time by transforming inputs to minimize energy, with theoretical guarantees for correct classification.