α

Published on arXiv

2510.18362

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves >70% attack success rate against C3D and I3D video classifiers with zero queries, and bypasses Video-LLM harmful-content recognition with >70% probability while remaining visually imperceptible.

FeatureFool

Novel technique introduced


The vulnerability of deep neural networks (DNNs) has been preliminarily verified. Existing black-box adversarial attacks usually require multi-round interaction with the model and consume numerous queries, which is impractical in the real-world and hard to scale to recently emerged Video-LLMs. Moreover, no attack in the video domain directly leverages feature maps to shift the clean-video feature space. We therefore propose FeatureFool, a stealthy, video-domain, zero-query black-box attack that utilizes information extracted from a DNN to alter the feature space of clean videos. Unlike query-based methods that rely on iterative interaction, FeatureFool performs a zero-query attack by directly exploiting DNN-extracted information. This efficient approach is unprecedented in the video domain. Experiments show that FeatureFool achieves an attack success rate above 70\% against traditional video classifiers without any queries. Benefiting from the transferability of the feature map, it can also craft harmful content and bypass Video-LLM recognition. Additionally, adversarial videos generated by FeatureFool exhibit high quality in terms of SSIM, PSNR, and Temporal-Inconsistency, making the attack barely perceptible. This paper may contain violent or explicit content.


Key Contributions

  • First zero-query black-box adversarial attack in the video domain that exploits feature maps from Guided Back-propagation without any model interaction.
  • Novel pipeline coupling Maximum-Optical-Flow frame selection with Guided Back-propagation to extract semantically strong feature-map perturbations broadcast across all frames.
  • Demonstrated >70% ASR against traditional video classifiers (C3D, I3D) and >70% bypass rate against Video-LLMs on harmful content with high visual quality (SSIM > 0.87, PSNR > 28 dB).

🛡️ Threat Analysis

Input Manipulation Attack

Core contribution is crafting imperceptible adversarial perturbations on video inputs at inference time using feature-map-guided perturbations, achieving >70% ASR against C3D and I3D video classifiers without querying the target model.


Details

Domains
visionmultimodal
Model Types
cnnvlmtransformer
Threat Tags
black_boxinference_timetargeteddigital
Datasets
HMDB-51UCF-101Kinetics-400UCF-Crime
Applications
video classificationvideo-llm content moderation