SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation
Published on arXiv
2511.11014
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
SP-Guard generates safer images than existing inference-time guidance methods while minimizing unintended alteration of benign image content through selective spatial masking
SP-Guard
Novel technique introduced
While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.
Key Contributions
- Prompt harmfulness estimator that dynamically adapts guidance strength based on the estimated risk level of the input prompt
- Selective spatial guidance mask that restricts safety intervention to only the unsafe regions of the generated image, preserving benign content fidelity
- Demonstrates that adaptivity and selectivity are jointly necessary for effective safe T2I generation without over-restriction
🛡️ Threat Analysis
SP-Guard directly controls the integrity and safety of diffusion model outputs by estimating prompt harmfulness and selectively steering generation away from unsafe image regions at inference time — this is output integrity enforcement for generative AI systems. The selective spatial masking is literally about ensuring generated image content meets safety standards.