defense 2025

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

Sumin Yu , Taesup Moon

0 citations · 39 references · arXiv

α

Published on arXiv

2511.11014

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

SP-Guard generates safer images than existing inference-time guidance methods while minimizing unintended alteration of benign image content through selective spatial masking

SP-Guard

Novel technique introduced


While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.


Key Contributions

  • Prompt harmfulness estimator that dynamically adapts guidance strength based on the estimated risk level of the input prompt
  • Selective spatial guidance mask that restricts safety intervention to only the unsafe regions of the generated image, preserving benign content fidelity
  • Demonstrates that adaptivity and selectivity are jointly necessary for effective safe T2I generation without over-restriction

🛡️ Threat Analysis

Output Integrity Attack

SP-Guard directly controls the integrity and safety of diffusion model outputs by estimating prompt harmfulness and selectively steering generation away from unsafe image regions at inference time — this is output integrity enforcement for generative AI systems. The selective spatial masking is literally about ensuring generated image content meets safety standards.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
inference_time
Applications
text-to-image generationsafe generative aicontent safety filtering