defense 2025

Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

Vladimir Frants , Sos Agaian

0 citations · 75 references · arXiv

α

Published on arXiv

2512.24499

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

ADS reduces diffusion steganographic decoder success rates to near zero with minimal perceptual impact, outperforming standard content transformations in the security-utility trade-off.

Adversarial Diffusion Sanitization (ADS)

Novel technique introduced


The rapid expansion of generative AI has normalized large-scale synthetic media creation, enabling new forms of covert communication. Recent generative steganography methods, particularly those based on diffusion models, can embed high-capacity payloads without fine-tuning or auxiliary decoders, creating significant challenges for detection and remediation. Coverless diffusion-based techniques are difficult to counter because they generate image carriers directly from secret data, enabling attackers to deliver stegomalware for command-and-control, payload staging, and data exfiltration while bypassing detectors that rely on cover-stego discrepancies. This work introduces Adversarial Diffusion Sanitization (ADS), a training-free defense for security gateways that neutralizes hidden payloads rather than detecting them. ADS employs an off-the-shelf pretrained denoiser as a differentiable proxy for diffusion-based decoders and incorporates a color-aware, quaternion-coupled update rule to reduce artifacts under strict distortion limits. Under a practical threat model and in evaluation against the state-of-the-art diffusion steganography method Pulsar, ADS drives decoder success rates to near zero with minimal perceptual impact. Results demonstrate that ADS provides a favorable security-utility trade-off compared to standard content transformations, offering an effective mitigation strategy against diffusion-driven steganography.


Key Contributions

  • Training-free Adversarial Diffusion Sanitization (ADS) that neutralizes hidden payloads in diffusion-generated images without requiring a dedicated trained model
  • Color-aware, quaternion-coupled adversarial update rule that minimizes perceptual distortion while disrupting steganographic decoding
  • Evaluation against Pulsar (state-of-the-art coverless diffusion steganography) demonstrating near-zero decoder success rates under realistic distortion budgets

🛡️ Threat Analysis

Output Integrity Attack

The paper defends against AI-generated content (diffusion model outputs) being weaponized as covert channels for stegomalware. ADS sanitizes image outputs by neutralizing hidden payloads embedded by diffusion-based steganography — a content integrity and provenance problem. The threat model centers on malicious use of generative AI outputs, and the defense restores output integrity at a security gateway.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
white_boxinference_timedigital
Datasets
Pulsar benchmark
Applications
security gatewayscovert communication disruptionstegomalware defense