defense arXiv Nov 14, 2025 · Nov 2025
Sumin Yu, Taesup Moon · Seoul National University
Defends T2I diffusion models from harmful prompt generation using prompt-adaptive guidance strength and selective spatial unsafe-region masking
Output Integrity Attack visiongenerative
While diffusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity--adjusting guidance strength based on the prompt--and selectivity--targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation.
diffusion Seoul National University