Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion

Deep neural networks used for human detection are highly vulnerable to adversarial manipulation, creating safety and privacy risks in real surveillance environments. Wearable attacks offer a realistic threat model, yet existing approaches usually optimize textures frame by frame and therefore fail to maintain concealment across long video sequences with motion, pose changes, and garment deformation. In this work, a sequence-level optimization framework is introduced to generate natural, printable adversarial textures for shirts, trousers, and hats that remain effective throughout entire walking videos in both digital and physical settings. Product images are first mapped to UV space and converted into a compact palette and control-point parameterization, with ICC locking to keep all colors printable. A physically based human-garment pipeline is then employed to simulate motion, multi-angle camera viewpoints, cloth dynamics, and illumination variation. An expectation-over-transformation objective with temporal weighting is used to optimize the control points so that detection confidence is minimized across whole sequences. Extensive experiments demonstrate strong and stable concealment, high robustness to viewpoint changes, and superior cross-model transferability. Physical garments produced with sublimation printing achieve reliable suppression under indoor and outdoor recordings, confirming real-world feasibility.

Key Contributions

Sequence-level adversarial texture optimization framework that minimizes detection confidence across entire walking video sequences rather than individual frames
UV-space parameterization with compact palette and control-point representation plus ICC profile locking to ensure all generated colors are physically printable
Expectation-over-transformation objective with temporal weighting combined with a physically-based garment simulation pipeline modeling cloth dynamics, multi-angle viewpoints, and illumination variation

🛡️ Threat Analysis

Input Manipulation Attack

Crafts adversarial inputs (printable clothing textures) at inference time using gradient-based sequence-level optimization to cause misclassification (missed person detection) in both digital and physical settings — a direct input manipulation attack on vision-based object detectors.

Details

Domains

vision

Model Types

cnntransformer

Threat Tags

white_boxblack_boxinference_timetargetedphysicaldigital

Applications

2026 0 cit.

Input Manipulation Attack

86%