attack 2025

Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix

Etienne Levecque ¹, Aurélien Noirault ², Tomáš Pevn{ý} ³, Jan Butora ², Patrick Bas ², Rémi Cogranne ¹

¹ University of Technology of Troyes

² University of Lille

³ Czech Technical University

0 citations · 13 references · arXiv

Published on arXiv

2510.12414

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

A Likelihood Ratio Test on the norm of inverted latent vectors successfully detects a previously 'image-space-undetectable' steganographic scheme in latent diffusion models; randomly sampling the latent norm before generation restores full undetectability.

Latent-Space Pooled LRT Steganalysis

Novel technique introduced

Steganographic schemes dedicated to generated images modify the seed vector in the latent space to embed a message. Whereas most steganalysis methods attempt to detect the embedding in the image space, this paper proposes to perform steganalysis in the latent space by modeling the statistical distribution of the norm of the latent vector. Specifically, we analyze the practical security of a scheme proposed by Hu et al. for latent diffusion models, which is both robust and practically undetectable when steganalysis is performed on generated images. We show that after embedding, the Stego (latent) vector is distributed on a hypersphere while the Cover vector is i.i.d. Gaussian. By going from the image space to the latent space, we show that it is possible to model the norm of the vector in the latent space under the Cover or Stego hypothesis as Gaussian distributions with different variances. A Likelihood Ratio Test is then derived to perform pooled steganalysis. The impact of the potential knowledge of the prompt and the number of diffusion steps is also studied. Additionally, we show how, by randomly sampling the norm of the latent vector before generation, the initial Stego scheme becomes undetectable in the latent space.

Key Contributions

Demonstrates that steganographic embedding in latent diffusion models distributes the stego latent vector on a hypersphere (vs. i.i.d. Gaussian for cover), making it statistically detectable via a Likelihood Ratio Test on the Frobenius norm of the inverted latent vector
Derives a pooled LRT-based steganalysis detector that breaks the practical undetectability of Hu et al.'s scheme when analysis is performed in the latent space rather than the image space
Proposes a fix — randomly sampling the latent norm before generation — that restores stego undetectability in the latent space while preserving robustness

🛡️ Threat Analysis

Output Integrity Attack

The paper analyzes and attacks the output integrity of AI-generated images: it detects hidden steganographic modifications to diffusion model outputs by exploiting statistical properties of the latent vector norm, then proposes a fix to make such modifications undetectable — directly addressing covert tampering with generative model outputs.

Details

Domains

generative

Model Types

diffusion

Threat Tags

white_boxinference_timetargeted

Applications

generative steganographylatent diffusion model image generation

Read PDF arXiv DOI

Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!

Roots Beneath the Cut: Uncovering the Risk of Concept Revival in Pruning-Based Unlearning for Diffusion Models

Attacks on Approximate Caches in Text-to-Image Diffusion Models

Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks

OSI: One-step Inversion Excels in Extracting Diffusion Watermarks

Identifying Models Behind Text-to-Image Leaderboards

ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching

Removal Attack and Defense on AI-generated Content Latent-based Watermarking