defense arXiv Feb 23, 2026 · 6w ago
Aleksandr Gushchin, Dmitriy S. Vatolin, Anastasia Antsiferova · ISP RAS Research Center for Trusted Artificial Intelligence · MSU Institute for Artificial Intelligence +2 more
Defends image quality assessment models against white-box adversarial attacks via Anchored Adversarial Training with ranking loss and clean anchor samples
Input Manipulation Attack vision
Full-Reference image quality assessment (FR IQA) is important for image compression, restoration and generative modeling, yet current neural metrics remain slow and vulnerable to adversarial perturbations. We present BiRQA, a compact FR IQA metric model that processes four fast complementary features within a bidirectional multiscale pyramid. A bottom-up attention module injects fine-scale cues into coarse levels through an uncertainty-aware gate, while a top-down cross-gating block routes semantic context back to high resolution. To enhance robustness, we introduce Anchored Adversarial Training, a theoretically grounded strategy that uses clean "anchor" samples and a ranking loss to bound pointwise prediction error under attacks. On five public FR IQA benchmarks BiRQA outperforms or matches the previous state of the art (SOTA) while running ~3x faster than previous SOTA models. Under unseen white-box attacks it lifts SROCC from 0.30-0.57 to 0.60-0.84 on KADID-10k, demonstrating substantial robustness gains. To our knowledge, BiRQA is the only FR IQA model combining competitive accuracy with real-time throughput and strong adversarial resilience.
cnn transformer ISP RAS Research Center for Trusted Artificial Intelligence · MSU Institute for Artificial Intelligence · Lomonosov Moscow State University +1 more
attack arXiv Feb 6, 2026 · 8w ago
Haipeng Li, Rongxuan Peng, Anwei Luo et al. · Shenzhen University · Nanyang Technological University +2 more
Adversarial perturbations that evade AI-generated content detectors by manipulating shared CLIP embeddings toward authentic anchors
Input Manipulation Attack Output Integrity Attack visionmultimodal
The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
vlm diffusion gan transformer Shenzhen University · Nanyang Technological University · Shenzhen MSU-BIT University +1 more