Creating Blank Canvas Against AI-enabled Image Forgery
Qi Song , Ziyuan Luo , Renjie Wan
Published on arXiv
2511.22237
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Frequency-aware adversarial perturbations fully suppress SAM's segmentation capability on protected images, enabling accurate localization of AIGC-tampered regions when the perturbation pattern is disrupted by editing.
Blank Canvas
Novel technique introduced
AIGC-based image editing technology has greatly simplified the realistic-level image modification, causing serious potential risks of image forgery. This paper introduces a new approach to tampering detection using the Segment Anything Model (SAM). Instead of training SAM to identify tampered areas, we propose a novel strategy. The entire image is transformed into a blank canvas from the perspective of neural models. Any modifications to this blank canvas would be noticeable to the models. To achieve this idea, we introduce adversarial perturbations to prevent SAM from ``seeing anything'', allowing it to identify forged regions when the image is tampered with. Due to SAM's powerful perceiving capabilities, naive adversarial attacks cannot completely tame SAM. To thoroughly deceive SAM and make it blind to the image, we introduce a frequency-aware optimization strategy, which further enhances the capability of tamper localization. Extensive experimental results demonstrate the effectiveness of our method.
Key Contributions
- Novel proactive forgery detection strategy that transforms images into a 'blank canvas' for neural models via adversarial perturbations, enabling tamper localization without retraining SAM
- Frequency-aware optimization strategy to comprehensively suppress SAM's perception across the full image, overcoming the limitations of naive adversarial attacks against a powerful segmentation model
- Demonstrated effectiveness at localizing AIGC-edited regions by exploiting the disruption of the protective perturbation pattern post-tampering
🛡️ Threat Analysis
Paper proposes a content protection and forgery-detection scheme: adversarial perturbations render an image 'invisible' to SAM, so any AIGC-based tampering breaks the pattern and becomes detectable — a direct contribution to output integrity and content authenticity against AI-generated forgeries.