ForensicsSAM: Toward Robust and Unified Image Forgery Detection and Localization Resisting to Adversarial Attack

Parameter-efficient fine-tuning (PEFT) has emerged as a popular strategy for adapting large vision foundation models, such as the Segment Anything Model (SAM) and LLaVA, to downstream tasks like image forgery detection and localization (IFDL). However, existing PEFT-based approaches overlook their vulnerability to adversarial attacks. In this paper, we show that highly transferable adversarial images can be crafted solely via the upstream model, without accessing the downstream model or training data, significantly degrading the IFDL performance. To address this, we propose ForensicsSAM, a unified IFDL framework with built-in adversarial robustness. Our design is guided by three key ideas: (1) To compensate for the lack of forgery-relevant knowledge in the frozen image encoder, we inject forgery experts into each transformer block to enhance its ability to capture forgery artifacts. These forgery experts are always activated and shared across any input images. (2) To detect adversarial images, we design an light-weight adversary detector that learns to capture structured, task-specific artifact in RGB domain, enabling reliable discrimination across various attack methods. (3) To resist adversarial attacks, we inject adversary experts into the global attention layers and MLP modules to progressively correct feature shifts induced by adversarial noise. These adversary experts are adaptively activated by the adversary detector, thereby avoiding unnecessary interference with clean images. Extensive experiments across multiple benchmarks demonstrate that ForensicsSAM achieves superior resistance to various adversarial attack methods, while also delivering state-of-the-art performance in image-level forgery detection and pixel-level forgery localization. The resource is available at https://github.com/siriusPRX/ForensicsSAM.

Key Contributions

Demonstrates that PEFT-based IFDL models built on vision foundation models are vulnerable to highly transferable adversarial attacks crafted solely from the upstream model without downstream model access
Proposes ForensicsSAM with forgery experts injected into each transformer block and a lightweight adversary detector that identifies structured adversarial artifacts in the RGB domain
Introduces adaptively activated adversary experts in global attention and MLP layers that progressively correct adversarial feature shifts while avoiding interference with clean images

🛡️ Threat Analysis

Input Manipulation Attack

The paper demonstrates that transferable adversarial perturbations crafted via the upstream model (SAM) degrade downstream IFDL model performance without access to the downstream model, and proposes a defense with an adversary detector and adversary experts to correct adversarial feature shifts — core adversarial robustness work.

Output Integrity Attack

The primary downstream task is image forgery detection and localization — verifying content integrity and authenticity by detecting tampered/manipulated images at both image-level and pixel-level, which is content integrity and output authenticity verification.