ML Security Papers

Latest papers

54 papers

defense arXiv Apr 30, 2026 · 21d ago

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Shuchang Zhou, Shangkun Wu, Jiwei Wei et al. · University of Electronic Science and Technology of China · Harbin Institute of Technology

Detects AI-generated images by fusing frequency-domain artifacts with semantic features via gated injection and hyperspherical learning

Output Integrity Attack visiongenerative

PDF

defense arXiv Apr 27, 2026 · 24d ago

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Plug-and-play inference-time backdoor defense detecting trigger-induced attention collapse in LLMs without parameter updates or latency overhead

Model Poisoning Training Data Poisoning nlp

PDF

defense arXiv Apr 17, 2026 · 4w ago

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

Hao Wang, Beichen Zhang, Yanpei Gong et al. · Harbin Institute of Technology

Incremental deepfake detector using semantic anchors from artifact cues to prevent catastrophic forgetting across new forgery types

Output Integrity Attack visionmultimodal

PDF

defense arXiv Apr 14, 2026 · 5w ago

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Wenyun Li, Zheng Zhang, Dongmei Jiang et al. · Harbin Institute of Technology · Pengcheng Laboratory +1 more

Parameter-efficient adversarial training for Vision Transformers achieving near-full robustness while fine-tuning only 6% of parameters

Input Manipulation Attack vision

PDF

benchmark arXiv Apr 13, 2026 · 5w ago

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Chenxi Qing, Junxi Wu, Zheng Liu et al. · Tsinghua University · Nankai University +2 more

Chinese benchmark for AI-generated text detection with real-world prompts across nine LLMs and multiple domains

Output Integrity Attack nlp

PDF Code

benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative

PDF

attack arXiv Apr 10, 2026 · 5w ago

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Weiyang Guo, Zesheng Shi, Zeen Zhu et al. · Harbin Institute of Technology · Huawei Technologies

Backdoor attack on RLVR-trained LLMs that implants jailbreak triggers using 2% poisoned data, degrading safety by 73%

Model Poisoning Transfer Learning Attack Prompt Injection nlpreinforcement-learning

PDF Code

defense arXiv Apr 6, 2026 · 6w ago

Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale

Zhengcen Li, Chenyang Jiang, Hang Zhao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Vision transformer detector operating at native video resolution to preserve high-frequency forgery artifacts in AI-generated videos

Output Integrity Attack visionmultimodal

PDF

attack arXiv Apr 2, 2026 · 7w ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp

PDF Code

Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.

vlm multimodal transformer East China Normal University · Zhongguancun Academy · A*STAR +2 more

PDF arXiv Code

attack arXiv Apr 1, 2026 · 7w ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning

PDF

defense arXiv Mar 31, 2026 · 7w ago

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong et al. · Harbin Institute of Technology · Shenzhen Loop Area Institute

Adversarial fine-tuning framework that preserves vision-language alignment while defending CLIP against adversarial perturbations in zero-shot settings

Input Manipulation Attack visionnlpmultimodal

PDF Code

defense arXiv Mar 25, 2026 · 8w ago

Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection

Jielun Peng, Yabin Wang, Yaqi Li et al. · Harbin Institute of Technology

Multimodal deepfake detector learning audio-visual coherence patterns to identify synthetic videos from commercial generators

Output Integrity Attack multimodalaudiovision

PDF Code

attack arXiv Mar 24, 2026 · 8w ago

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu et al. · Zhejiang University · Harbin Institute of Technology

Targeted adversarial patch attack hijacks VLA robotic control by corrupting CoT reasoning to induce specific malicious behaviors

Input Manipulation Attack multimodalvisionnlp

PDF

attack arXiv Mar 18, 2026 · 9w ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative

PDF

defense arXiv Mar 12, 2026 · 10w ago

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang et al. · Great Bay University · Shenzhen University +2 more

Forensic-aware visual token pruning for deepfake/AIGC detection VLMs using Birth-Death Optimal Transport to preserve manipulation traces

Output Integrity Attack visionmultimodalnlp

PDF Code

defense arXiv Mar 2, 2026 · 11w ago

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Chao Chen, Yanhui Chen, Shanshan Lin et al. · Harbin Institute of Technology · Fuzhou University +1 more

Adversarial training framework combining explanation-guided constraints to improve robustness and saliency map stability against adversarial attacks

Input Manipulation Attack vision

PDF

attack arXiv Mar 2, 2026 · 11w ago

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Duoxun Tang, Dasen Dai, Jiyao Wang et al. · Tsinghua University · The Chinese University of Hong Kong +4 more

Universal sponge attack on Video-LLMs inflates token generation 205× and inference latency 15× via optimized adversarial video frame triggers

Input Manipulation Attack Model Denial of Service multimodalvisionnlp

PDF Code

tool arXiv Feb 21, 2026 · 12w ago

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu, Tonghua Su, Hongshi Zhang et al. · Harbin Institute of Technology · DZ-Matrix +3 more

Multimodal LLM system detects and localizes AI-generated image forgeries by fusing RGB and frequency-domain forensic features

Output Integrity Attack visionmultimodal

PDF

defense arXiv Feb 12, 2026 · Feb 2026

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Zhaoxin Wang, Jiaming Liang, Fengbin Zhu et al. · Xidian University · National University of Singapore +1 more

Defends LLM safety alignment against neuron pruning attacks by redistributing safety representations across the network via selective neuron freezing

Prompt Injection nlpmultimodal

PDF

attack arXiv Feb 9, 2026 · Feb 2026

Generating Adversarial Events: A Motion-Aware Point Cloud Framework

Hongwei Ren, Youxin Jiang, Qifei Gu et al. · Harbin Institute of Technology

Proposes gradient-based adversarial attack on event-camera DNNs via point cloud bridge, achieving 100% success rate with minimal perturbation

Input Manipulation Attack vision

PDF

Loading more papers…

Latest papers

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Explanation-Guided Adversarial Training for Robust and Interpretable Models

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Generating Adversarial Events: A Motion-Aware Point Cloud Framework

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue