Latest papers

54 papers
defense arXiv Apr 30, 2026 · 21d ago

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Shuchang Zhou, Shangkun Wu, Jiwei Wei et al. · University of Electronic Science and Technology of China · Harbin Institute of Technology

Detects AI-generated images by fusing frequency-domain artifacts with semantic features via gated injection and hyperspherical learning

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 27, 2026 · 24d ago

Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing

Kaisheng Fan, Weizhe Zhang, Yishu Gao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Plug-and-play inference-time backdoor defense detecting trigger-induced attention collapse in LLMs without parameter updates or latency overhead

Model Poisoning Training Data Poisoning nlp
PDF
defense arXiv Apr 17, 2026 · 4w ago

AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection

Hao Wang, Beichen Zhang, Yanpei Gong et al. · Harbin Institute of Technology

Incremental deepfake detector using semantic anchors from artifact cues to prevent catastrophic forgetting across new forgery types

Output Integrity Attack visionmultimodal
PDF
defense arXiv Apr 14, 2026 · 5w ago

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Wenyun Li, Zheng Zhang, Dongmei Jiang et al. · Harbin Institute of Technology · Pengcheng Laboratory +1 more

Parameter-efficient adversarial training for Vision Transformers achieving near-full robustness while fine-tuning only 6% of parameters

Input Manipulation Attack vision
PDF
benchmark arXiv Apr 13, 2026 · 5w ago

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Chenxi Qing, Junxi Wu, Zheng Liu et al. · Tsinghua University · Nankai University +2 more

Chinese benchmark for AI-generated text detection with real-world prompts across nine LLMs and multiple domains

Output Integrity Attack nlp
PDF Code
benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative
PDF
attack arXiv Apr 10, 2026 · 5w ago

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Weiyang Guo, Zesheng Shi, Zeen Zhu et al. · Harbin Institute of Technology · Huawei Technologies

Backdoor attack on RLVR-trained LLMs that implants jailbreak triggers using 2% poisoned data, degrading safety by 73%

Model Poisoning Transfer Learning Attack Prompt Injection nlpreinforcement-learning
PDF Code
defense arXiv Apr 6, 2026 · 6w ago

Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale

Zhengcen Li, Chenyang Jiang, Hang Zhao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Vision transformer detector operating at native video resolution to preserve high-frequency forgery artifacts in AI-generated videos

Output Integrity Attack visionmultimodal
PDF
attack arXiv Apr 2, 2026 · 7w ago

Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

Jiawei Chen, Simin Huang, Jiawei Du et al. · East China Normal University · Zhongguancun Academy +3 more

Physically realizable 3D adversarial textures that degrade vision-language-action robot models with 96.7% task failure rates

Input Manipulation Attack visionmultimodalnlp
PDF Code
attack arXiv Apr 1, 2026 · 7w ago

Enhancing Gradient Inversion Attacks in Federated Learning via Hierarchical Feature Optimization

Hao Fang, Wenbo Yu, Bin Chen et al. · Tsinghua University · Harbin Institute of Technology

GAN-based gradient inversion attack reconstructing client training data from FL gradients via hierarchical feature optimization

Model Inversion Attack visionfederated-learning
PDF
defense arXiv Mar 31, 2026 · 7w ago

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Yubo Cui, Xianchao Guan, Zijun Xiong et al. · Harbin Institute of Technology · Shenzhen Loop Area Institute

Adversarial fine-tuning framework that preserves vision-language alignment while defending CLIP against adversarial perturbations in zero-shot settings

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Mar 25, 2026 · 8w ago

Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection

Jielun Peng, Yabin Wang, Yaqi Li et al. · Harbin Institute of Technology

Multimodal deepfake detector learning audio-visual coherence patterns to identify synthetic videos from commercial generators

Output Integrity Attack multimodalaudiovision
PDF Code
attack arXiv Mar 24, 2026 · 8w ago

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu et al. · Zhejiang University · Harbin Institute of Technology

Targeted adversarial patch attack hijacks VLA robotic control by corrupting CoT reasoning to induce specific malicious behaviors

Input Manipulation Attack multimodalvisionnlp
PDF
attack arXiv Mar 18, 2026 · 9w ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
defense arXiv Mar 12, 2026 · 10w ago

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang et al. · Great Bay University · Shenzhen University +2 more

Forensic-aware visual token pruning for deepfake/AIGC detection VLMs using Birth-Death Optimal Transport to preserve manipulation traces

Output Integrity Attack visionmultimodalnlp
PDF Code
defense arXiv Mar 2, 2026 · 11w ago

Explanation-Guided Adversarial Training for Robust and Interpretable Models

Chao Chen, Yanhui Chen, Shanshan Lin et al. · Harbin Institute of Technology · Fuzhou University +1 more

Adversarial training framework combining explanation-guided constraints to improve robustness and saliency map stability against adversarial attacks

Input Manipulation Attack vision
PDF
attack arXiv Mar 2, 2026 · 11w ago

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Duoxun Tang, Dasen Dai, Jiyao Wang et al. · Tsinghua University · The Chinese University of Hong Kong +4 more

Universal sponge attack on Video-LLMs inflates token generation 205× and inference latency 15× via optimized adversarial video frame triggers

Input Manipulation Attack Model Denial of Service multimodalvisionnlp
PDF Code
tool arXiv Feb 21, 2026 · 12w ago

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Zhou Liu, Tonghua Su, Hongshi Zhang et al. · Harbin Institute of Technology · DZ-Matrix +3 more

Multimodal LLM system detects and localizes AI-generated image forgeries by fusing RGB and frequency-domain forensic features

Output Integrity Attack visionmultimodal
PDF
defense arXiv Feb 12, 2026 · Feb 2026

SafeNeuron: Neuron-Level Safety Alignment for Large Language Models

Zhaoxin Wang, Jiaming Liang, Fengbin Zhu et al. · Xidian University · National University of Singapore +1 more

Defends LLM safety alignment against neuron pruning attacks by redistributing safety representations across the network via selective neuron freezing

Prompt Injection nlpmultimodal
PDF
attack arXiv Feb 9, 2026 · Feb 2026

Generating Adversarial Events: A Motion-Aware Point Cloud Framework

Hongwei Ren, Youxin Jiang, Qifei Gu et al. · Harbin Institute of Technology

Proposes gradient-based adversarial attack on event-camera DNNs via point cloud bridge, achieving 100% success rate with minimal perturbation

Input Manipulation Attack vision
PDF
Loading more papers…