Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion
Dingkun Zhou 1, Patrick P. K. Chan 1, Hengxu Wu 2, Shikang Zheng 1, Ruiqi Huang 1, Yuanjie Zhao 1
Published on arXiv
2511.16020
Input Manipulation Attack
OWASP ML Top 10 — ML01
Key Finding
Physical garments produced via sublimation printing reliably suppress human detection across indoor and outdoor recordings with strong cross-model transferability across video sequences
Sequence-Level Adversarial Texture Optimization (EoT with temporal weighting)
Novel technique introduced
Deep neural networks used for human detection are highly vulnerable to adversarial manipulation, creating safety and privacy risks in real surveillance environments. Wearable attacks offer a realistic threat model, yet existing approaches usually optimize textures frame by frame and therefore fail to maintain concealment across long video sequences with motion, pose changes, and garment deformation. In this work, a sequence-level optimization framework is introduced to generate natural, printable adversarial textures for shirts, trousers, and hats that remain effective throughout entire walking videos in both digital and physical settings. Product images are first mapped to UV space and converted into a compact palette and control-point parameterization, with ICC locking to keep all colors printable. A physically based human-garment pipeline is then employed to simulate motion, multi-angle camera viewpoints, cloth dynamics, and illumination variation. An expectation-over-transformation objective with temporal weighting is used to optimize the control points so that detection confidence is minimized across whole sequences. Extensive experiments demonstrate strong and stable concealment, high robustness to viewpoint changes, and superior cross-model transferability. Physical garments produced with sublimation printing achieve reliable suppression under indoor and outdoor recordings, confirming real-world feasibility.
Key Contributions
- Sequence-level adversarial texture optimization framework that minimizes detection confidence across entire walking video sequences rather than individual frames
- UV-space parameterization with compact palette and control-point representation plus ICC profile locking to ensure all generated colors are physically printable
- Expectation-over-transformation objective with temporal weighting combined with a physically-based garment simulation pipeline modeling cloth dynamics, multi-angle viewpoints, and illumination variation
🛡️ Threat Analysis
Crafts adversarial inputs (printable clothing textures) at inference time using gradient-based sequence-level optimization to cause misclassification (missed person detection) in both digital and physical settings — a direct input manipulation attack on vision-based object detectors.