PoseGuard: Pose-Guided Generation with Safety Guardrails

Pose-guided video generation has become a powerful tool in creative industries, exemplified by frameworks like Animate Anyone. However, conditioning generation on specific poses introduces serious risks, such as impersonation, privacy violations, and NSFW content creation. To address these challenges, we propose $\textbf{PoseGuard}$, a safety alignment framework for pose-guided generation. PoseGuard is designed to suppress unsafe generations by degrading output quality when encountering malicious poses, while maintaining high-fidelity outputs for benign inputs. We categorize unsafe poses into three representative types: discriminatory gestures such as kneeling or offensive salutes, sexually suggestive poses that lead to NSFW content, and poses imitating copyrighted celebrity movements. PoseGuard employs a dual-objective training strategy combining generation fidelity with safety alignment, and uses LoRA-based fine-tuning for efficient, parameter-light updates. To ensure adaptability to evolving threats, PoseGuard supports pose-specific LoRA fusion, enabling flexible and modular updates when new unsafe poses are identified. We further demonstrate the generalizability of PoseGuard to facial landmark-guided generation. Extensive experiments validate that PoseGuard effectively blocks unsafe generations, maintains generation quality for benign inputs, and remains robust against slight pose variations.

Key Contributions

First framework to embed safety guardrails directly into pose-guided video generation model parameters (UNet/LoRA), eliminating reliance on bypassable external filters
Dual-objective training strategy combining generation fidelity loss for benign poses with safety alignment loss that redirects unsafe pose outputs to predefined safe targets
Pose-specific LoRA fusion mechanism enabling modular, incremental updates to defense coverage as new unsafe pose categories are identified

🛡️ Threat Analysis

Output Integrity Attack

PoseGuard is fundamentally about output integrity of AI-generated content — it prevents harmful AI-generated video outputs (deepfakes, NSFW, impersonation of celebrities) by degrading output quality for unsafe poses. The threat model centers on misuse of generative AI to produce harmful content, and the defense ensures safe, authentic outputs from the model.

Details

Domains

visiongenerative

Model Types

diffusion

Threat Tags

training_timeinference_time

Applications

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SPDMark: Selective Parameter Displacement for Robust Video Watermarking

Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness

A Difference-in-Difference Approach to Detecting AI-Generated Images

EIRES:Training-free AI-Generated Image Detection via Edit-Induced Reconstruction Error Shift

Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

ALIEN: Analytic Latent Watermarking for Controllable Generation