attack arXiv Oct 21, 2025 · Oct 2025
Duoxun Tang, Xi Xiao, Guangwu Hu et al. · Tsinghua University · Shenzhen University of Information Technology +4 more
Zero-query black-box adversarial video attack using guided backpropagation feature maps to fool classifiers and bypass Video-LLM harmful content detection
Input Manipulation Attack Prompt Injection visionmultimodal
The vulnerability of deep neural networks (DNNs) has been preliminarily verified. Existing black-box adversarial attacks usually require multi-round interaction with the model and consume numerous queries, which is impractical in the real-world and hard to scale to recently emerged Video-LLMs. Moreover, no attack in the video domain directly leverages feature maps to shift the clean-video feature space. We therefore propose FeatureFool, a stealthy, video-domain, zero-query black-box attack that utilizes information extracted from a DNN to alter the feature space of clean videos. Unlike query-based methods that rely on iterative interaction, FeatureFool performs a zero-query attack by directly exploiting DNN-extracted information. This efficient approach is unprecedented in the video domain. Experiments show that FeatureFool achieves an attack success rate above 70\% against traditional video classifiers without any queries. Benefiting from the transferability of the feature map, it can also craft harmful content and bypass Video-LLM recognition. Additionally, adversarial videos generated by FeatureFool exhibit high quality in terms of SSIM, PSNR, and Temporal-Inconsistency, making the attack barely perceptible. This paper may contain violent or explicit content.
cnn vlm transformer Tsinghua University · Shenzhen University of Information Technology · Harbin Institute of Technology +3 more
defense arXiv Sep 22, 2025 · Sep 2025
Duoxun Tang, Xinhang Jiang, Jiajun Niu · Tsinghua University · The Chinese University of Hong Kong
Defends text embeddings against inversion attacks using RL-learned, geometry-aware orthogonal noise with PII signal guidance
Model Inversion Attack nlp
Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing. We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection orthogonal to user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility. Unlike prior defenses either non learnable or agnostic to perturbation direction, TextCrafter provides a directional protective policy that balances privacy and utility. Under strong privacy setting, TextCrafter maintains 70 percentage classification accuracy on four datasets and consistently outperforms Gaussian/LDP baselines across lower privacy budgets, demonstrating a superior privacy utility trade off.
llm transformer Tsinghua University · The Chinese University of Hong Kong
defense arXiv Jan 5, 2026 · Jan 2026
Duoxun Tang, Xueyi Zhang, Chak Hin Wang et al. · Tsinghua University · The Chinese University of Hong Kong +2 more
Defends video recognition models against PGD and CW attacks via flow-matching purification with masking and frequency-gated loss
Input Manipulation Attack vision
Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regressing clean videos from adversarial inputs often fails to recover faithful content due to the subtle nature of perturbations; this necessitates physically shattering the adversarial structure. Therefore, we propose Flow Matching for Adversarial Video Purification FMVP. FMVP physically shatters global adversarial structures via a masking strategy and reconstructs clean video dynamics using Conditional Flow Matching (CFM) with an inpainting objective. To further decouple semantic content from adversarial noise, we design a Frequency-Gated Loss (FGL) that explicitly suppresses high-frequency adversarial residuals while preserving low-frequency fidelity. We design Attack-Aware and Generalist training paradigms to handle known and unknown threats, respectively. Extensive experiments on UCF-101 and HMDB-51 demonstrate that FMVP outperforms state-of-the-art methods (DiffPure, Defense Patterns (DP), Temporal Shuffling (TS) and FlowPure), achieving robust accuracy exceeding 87% against PGD and 89% against CW attacks. Furthermore, FMVP demonstrates superior robustness against adaptive attacks (DiffHammer) and functions as a zero-shot adversarial detector, attaining AUC-ROC scores of 0.98 for PGD and 0.79 for highly imperceptible CW attacks.
cnn transformer Tsinghua University · The Chinese University of Hong Kong · University of New South Wales +1 more