CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion
Shuhan Xia 1, Jing Dai 2, Hui Ouyang 3, Yadong Shang 2, Dongxiao Zhao 3, Peipei Li 1
Published on arXiv
2511.21180
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
CAHS-Attack achieves state-of-the-art adversarial attack success on Stable Diffusion across diverse prompt lengths and semantics using only black-box access via MCTS-guided suffix optimization.
CAHS-Attack
Novel technique introduced
Diffusion models exhibit notable fragility when faced with adversarial prompts, and strengthening attack capabilities is crucial for uncovering such vulnerabilities and building more robust generative systems. Existing works often rely on white-box access to model gradients or hand-crafted prompt engineering, which is infeasible in real-world deployments due to restricted access or poor attack effect. In this paper, we propose CAHS-Attack , a CLIP-Aware Heuristic Search attack method. CAHS-Attack integrates Monte Carlo Tree Search (MCTS) to perform fine-grained suffix optimization, leveraging a constrained genetic algorithm to preselect high-potential adversarial prompts as root nodes, and retaining the most semantically disruptive outcome at each simulation rollout for efficient local search. Extensive experiments demonstrate that our method achieves state-of-the-art attack performance across both short and long prompts of varying semantics. Furthermore, we find that the fragility of SD models can be attributed to the inherent vulnerability of their CLIP-based text encoders, suggesting a fundamental security risk in current text-to-image pipelines.
Key Contributions
- CAHS-Attack: a black-box adversarial prompt suffix attack combining MCTS for fine-grained suffix optimization with a constrained genetic algorithm for high-potential root node preselection
- Demonstrates that Stable Diffusion's fragility to adversarial prompts is attributable to fundamental vulnerabilities in its CLIP-based text encoder
- Achieves state-of-the-art attack performance across both short and long prompts of varying semantics without requiring gradient access
🛡️ Threat Analysis
CAHS-Attack performs token-level adversarial suffix optimization against Stable Diffusion's CLIP text encoder at inference time using MCTS and constrained genetic algorithms — this is adversarial input manipulation causing unintended/unsafe model outputs, analogous to adversarial suffix attacks but using black-box heuristic search instead of gradients.