PoseGuard: Pose-Guided Generation with Safety Guardrails
Kongxin Wang 1, Jie Zhang 2, Peigui Qi 1, Kunsheng Tang 1, Tianwei Zhang 3, Wenbo Zhou 1
Published on arXiv
2508.02476
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
PoseGuard effectively suppresses unsafe generations across discriminatory, NSFW, and copyright-infringing pose categories while maintaining high-fidelity output for benign inputs and remaining robust to slight pose perturbations.
PoseGuard
Novel technique introduced
Pose-guided video generation has become a powerful tool in creative industries, exemplified by frameworks like Animate Anyone. However, conditioning generation on specific poses introduces serious risks, such as impersonation, privacy violations, and NSFW content creation. To address these challenges, we propose $\textbf{PoseGuard}$, a safety alignment framework for pose-guided generation. PoseGuard is designed to suppress unsafe generations by degrading output quality when encountering malicious poses, while maintaining high-fidelity outputs for benign inputs. We categorize unsafe poses into three representative types: discriminatory gestures such as kneeling or offensive salutes, sexually suggestive poses that lead to NSFW content, and poses imitating copyrighted celebrity movements. PoseGuard employs a dual-objective training strategy combining generation fidelity with safety alignment, and uses LoRA-based fine-tuning for efficient, parameter-light updates. To ensure adaptability to evolving threats, PoseGuard supports pose-specific LoRA fusion, enabling flexible and modular updates when new unsafe poses are identified. We further demonstrate the generalizability of PoseGuard to facial landmark-guided generation. Extensive experiments validate that PoseGuard effectively blocks unsafe generations, maintains generation quality for benign inputs, and remains robust against slight pose variations.
Key Contributions
- First framework to embed safety guardrails directly into pose-guided video generation model parameters (UNet/LoRA), eliminating reliance on bypassable external filters
- Dual-objective training strategy combining generation fidelity loss for benign poses with safety alignment loss that redirects unsafe pose outputs to predefined safe targets
- Pose-specific LoRA fusion mechanism enabling modular, incremental updates to defense coverage as new unsafe pose categories are identified
🛡️ Threat Analysis
PoseGuard is fundamentally about output integrity of AI-generated content — it prevents harmful AI-generated video outputs (deepfakes, NSFW, impersonation of celebrities) by degrading output quality for unsafe poses. The threat model centers on misuse of generative AI to produce harmful content, and the defense ensures safe, authentic outputs from the model.