defense 2025

PoseGuard: Pose-Guided Generation with Safety Guardrails

Kongxin Wang 1, Jie Zhang 2, Peigui Qi 1, Kunsheng Tang 1, Tianwei Zhang 3, Wenbo Zhou 1

0 citations

α

Published on arXiv

2508.02476

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PoseGuard effectively suppresses unsafe generations across discriminatory, NSFW, and copyright-infringing pose categories while maintaining high-fidelity output for benign inputs and remaining robust to slight pose perturbations.

PoseGuard

Novel technique introduced


Pose-guided video generation has become a powerful tool in creative industries, exemplified by frameworks like Animate Anyone. However, conditioning generation on specific poses introduces serious risks, such as impersonation, privacy violations, and NSFW content creation. To address these challenges, we propose $\textbf{PoseGuard}$, a safety alignment framework for pose-guided generation. PoseGuard is designed to suppress unsafe generations by degrading output quality when encountering malicious poses, while maintaining high-fidelity outputs for benign inputs. We categorize unsafe poses into three representative types: discriminatory gestures such as kneeling or offensive salutes, sexually suggestive poses that lead to NSFW content, and poses imitating copyrighted celebrity movements. PoseGuard employs a dual-objective training strategy combining generation fidelity with safety alignment, and uses LoRA-based fine-tuning for efficient, parameter-light updates. To ensure adaptability to evolving threats, PoseGuard supports pose-specific LoRA fusion, enabling flexible and modular updates when new unsafe poses are identified. We further demonstrate the generalizability of PoseGuard to facial landmark-guided generation. Extensive experiments validate that PoseGuard effectively blocks unsafe generations, maintains generation quality for benign inputs, and remains robust against slight pose variations.


Key Contributions

  • First framework to embed safety guardrails directly into pose-guided video generation model parameters (UNet/LoRA), eliminating reliance on bypassable external filters
  • Dual-objective training strategy combining generation fidelity loss for benign poses with safety alignment loss that redirects unsafe pose outputs to predefined safe targets
  • Pose-specific LoRA fusion mechanism enabling modular, incremental updates to defense coverage as new unsafe pose categories are identified

🛡️ Threat Analysis

Output Integrity Attack

PoseGuard is fundamentally about output integrity of AI-generated content — it prevents harmful AI-generated video outputs (deepfakes, NSFW, impersonation of celebrities) by degrading output quality for unsafe poses. The threat model centers on misuse of generative AI to produce harmful content, and the defense ensures safe, authentic outputs from the model.


Details

Domains
visiongenerative
Model Types
diffusion
Threat Tags
training_timeinference_time
Applications
pose-guided video generationhuman animationdeepfake preventionnsfw content filtering