benchmark 2025

Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification

Samuel Räber , Till Aczel , Andreas Plesner , Roger Wattenhofer

0 citations

α

Published on arXiv

2508.05489

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

High-realism learned compression models resist strong adaptive adversarial attacks due to distributional alignment with natural images, while low-realism compression defenses are fully broken — robustness is not caused by gradient masking.


Previous work has suggested that preprocessing images through lossy compression can defend against adversarial perturbations, but comprehensive attack evaluations have been lacking. In this paper, we construct strong white-box and adaptive attacks against various compression models and identify a critical challenge for attackers: high realism in reconstructed images significantly increases attack difficulty. Through rigorous evaluation across multiple attack scenarios, we demonstrate that compression models capable of producing realistic, high-fidelity reconstructions are substantially more resistant to our attacks. In contrast, low-realism compression models can be broken. Our analysis reveals that this is not due to gradient masking. Rather, realistic reconstructions maintaining distributional alignment with natural images seem to offer inherent robustness. This work highlights a significant obstacle for future adversarial attacks and suggests that developing more effective techniques to overcome realism represents an essential challenge for comprehensive security evaluation.


Key Contributions

  • Rigorous white-box and adaptive attack evaluation (PGD + EoT) against a diverse set of compression-based adversarial purification defenses, filling gaps in prior evaluation protocols
  • Discovery that high-realism compression models are substantially more resistant to strong adaptive attacks while low-realism models are fully broken
  • Analysis ruling out gradient masking as the explanation, attributing robustness instead to distributional alignment of realistic reconstructions with natural image statistics

🛡️ Threat Analysis

Input Manipulation Attack

Paper constructs white-box and adaptive adversarial evasion attacks against compression-based input purification defenses, analyzing whether adversarial perturbations survive lossy compression preprocessing at inference time; the entire threat model is adversarial example attacks on image classifiers.


Details

Domains
vision
Model Types
cnngenerative
Threat Tags
white_boxinference_timedigital
Datasets
ImageNet
Applications
image classificationadversarial purification