α

Published on arXiv

2603.23178

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves high perceptual quality and strong robustness against compression, filtering, geometric transformations, and adversarial perturbations across multiple deepfake datasets

SAiW

Novel technique introduced


Deepfakes generated by modern generative models pose a serious threat to information integrity, digital identity, and public trust. Existing detection methods are largely reactive, attempting to identify manipulations after they occur and often failing to generalize across evolving generation techniques. This motivates the need for proactive mechanisms that secure media authenticity at the time of creation. In this work, we introduce SAiW, a Source-Attributed Invisible watermarking Framework for proactive deepfake defense and media provenance verification. Unlike conventional watermarking methods that treat watermark payloads as generic signals, SAiW formulates watermark embedding as a source-conditioned representation learning problem, where watermark identity encodes the originating source and modulates the embedding process to produce discriminative and traceable signatures. The framework integrates feature-wise linear modulation to inject source identity into the embedding network, enabling scalable multi-source watermark generation. A perceptual guidance module derived from human visual system priors ensures that watermark perturbations remain visually imperceptible while maintaining robustness. In addition, a dual-purpose forensic decoder simultaneously reconstructs the embedded watermark and performs source attribution, providing both automated verification and interpretable forensic evidence. Extensive experiments across multiple deepfake datasets demonstrate that SAiW achieves high perceptual quality while maintaining strong robustness against compression, filtering, noise, geometric transformations, and adversarial perturbations. By binding digital media to its origin through invisible yet verifiable markers, SAiW enables reliable authentication and source attribution, providing a scalable foundation for proactive deepfake defense and trustworthy media provenance.


Key Contributions

  • Source-conditioned watermark embedding using feature-wise linear modulation to encode originating source identity
  • Perceptual guidance module ensuring imperceptibility while maintaining robustness against compression, filtering, noise, and adversarial perturbations
  • Dual-purpose forensic decoder for simultaneous watermark reconstruction and source attribution

🛡️ Threat Analysis

Output Integrity Attack

Watermarks model-generated content (deepfakes) to verify provenance and authenticate media outputs — this is content watermarking for output integrity and deepfake detection, not model IP protection.


Details

Domains
visiongenerative
Model Types
gandiffusion
Threat Tags
inference_time
Applications
deepfake detectionmedia provenancecontent authentication