SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.

Key Contributions

Prompt harmfulness estimator that dynamically adapts guidance strength based on the estimated risk level of the input prompt
Selective spatial guidance mask that restricts safety intervention to only the unsafe regions of the generated image, preserving benign content fidelity
Demonstrates that adaptivity and selectivity are jointly necessary for effective safe T2I generation without over-restriction

🛡️ Threat Analysis

Output Integrity Attack

SP-Guard directly controls the integrity and safety of diffusion model outputs by estimating prompt harmfulness and selectively steering generation away from unsafe image regions at inference time — this is output integrity enforcement for generative AI systems. The selective spatial masking is literally about ensuring generated image content meets safety standards.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

inference_time

Applications

2026 0 cit.

Output Integrity Attack

100%

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

FIND: A Simple yet Effective Baseline for Diffusion-Generated Image Detection

Proof-of-Authorship for Diffusion-based AI Generated Content

A Difference-in-Difference Approach to Detecting AI-Generated Images

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

I2VWM: Robust Watermarking for Image to Video Generation

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

ShapeMark: Robust and Diversity-Preserving Watermarking for Diffusion Models