Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways

The rapid expansion of generative AI has normalized large-scale synthetic media creation, enabling new forms of covert communication. Recent generative steganography methods, particularly those based on diffusion models, can embed high-capacity payloads without fine-tuning or auxiliary decoders, creating significant challenges for detection and remediation. Coverless diffusion-based techniques are difficult to counter because they generate image carriers directly from secret data, enabling attackers to deliver stegomalware for command-and-control, payload staging, and data exfiltration while bypassing detectors that rely on cover-stego discrepancies. This work introduces Adversarial Diffusion Sanitization (ADS), a training-free defense for security gateways that neutralizes hidden payloads rather than detecting them. ADS employs an off-the-shelf pretrained denoiser as a differentiable proxy for diffusion-based decoders and incorporates a color-aware, quaternion-coupled update rule to reduce artifacts under strict distortion limits. Under a practical threat model and in evaluation against the state-of-the-art diffusion steganography method Pulsar, ADS drives decoder success rates to near zero with minimal perceptual impact. Results demonstrate that ADS provides a favorable security-utility trade-off compared to standard content transformations, offering an effective mitigation strategy against diffusion-driven steganography.

Key Contributions

Training-free Adversarial Diffusion Sanitization (ADS) that neutralizes hidden payloads in diffusion-generated images without requiring a dedicated trained model
Color-aware, quaternion-coupled adversarial update rule that minimizes perceptual distortion while disrupting steganographic decoding
Evaluation against Pulsar (state-of-the-art coverless diffusion steganography) demonstrating near-zero decoder success rates under realistic distortion budgets

🛡️ Threat Analysis

Output Integrity Attack

The paper defends against AI-generated content (diffusion model outputs) being weaponized as covert channels for stegomalware. ADS sanitizes image outputs by neutralizing hidden payloads embedded by diffusion-based steganography — a content integrity and provenance problem. The threat model centers on malicious use of generative AI outputs, and the defense restores output integrity at a security gateway.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

white_boxinference_timedigital

Datasets

Pulsar benchmark

Applications

2026 0 cit.

Output Integrity Attack

92%