Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways
Published on arXiv
2512.24499
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
ADS reduces diffusion steganographic decoder success rates to near zero with minimal perceptual impact, outperforming standard content transformations in the security-utility trade-off.
Adversarial Diffusion Sanitization (ADS)
Novel technique introduced
The rapid expansion of generative AI has normalized large-scale synthetic media creation, enabling new forms of covert communication. Recent generative steganography methods, particularly those based on diffusion models, can embed high-capacity payloads without fine-tuning or auxiliary decoders, creating significant challenges for detection and remediation. Coverless diffusion-based techniques are difficult to counter because they generate image carriers directly from secret data, enabling attackers to deliver stegomalware for command-and-control, payload staging, and data exfiltration while bypassing detectors that rely on cover-stego discrepancies. This work introduces Adversarial Diffusion Sanitization (ADS), a training-free defense for security gateways that neutralizes hidden payloads rather than detecting them. ADS employs an off-the-shelf pretrained denoiser as a differentiable proxy for diffusion-based decoders and incorporates a color-aware, quaternion-coupled update rule to reduce artifacts under strict distortion limits. Under a practical threat model and in evaluation against the state-of-the-art diffusion steganography method Pulsar, ADS drives decoder success rates to near zero with minimal perceptual impact. Results demonstrate that ADS provides a favorable security-utility trade-off compared to standard content transformations, offering an effective mitigation strategy against diffusion-driven steganography.
Key Contributions
- Training-free Adversarial Diffusion Sanitization (ADS) that neutralizes hidden payloads in diffusion-generated images without requiring a dedicated trained model
- Color-aware, quaternion-coupled adversarial update rule that minimizes perceptual distortion while disrupting steganographic decoding
- Evaluation against Pulsar (state-of-the-art coverless diffusion steganography) demonstrating near-zero decoder success rates under realistic distortion budgets
🛡️ Threat Analysis
The paper defends against AI-generated content (diffusion model outputs) being weaponized as covert channels for stegomalware. ADS sanitizes image outputs by neutralizing hidden payloads embedded by diffusion-based steganography — a content integrity and provenance problem. The threat model centers on malicious use of generative AI outputs, and the defense restores output integrity at a security gateway.