Latest papers

16 papers
defense arXiv Apr 6, 2026 · 2d ago

Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale

Zhengcen Li, Chenyang Jiang, Hang Zhao et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Vision transformer detector operating at native video resolution to preserve high-frequency forgery artifacts in AI-generated videos

Output Integrity Attack visionmultimodal
PDF
attack arXiv Mar 18, 2026 · 21d ago

TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models

Qianlong Xiang, Miao Zhang, Haoyu Zhang et al. · Harbin Institute of Technology · City University of Hong Kong +3 more

Text-free inversion attack that recovers supposedly erased concepts from diffusion models by exploiting persistent visual knowledge

Model Inversion Attack visiongenerative
PDF
attack arXiv Feb 10, 2026 · 8w ago

When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models

Jiacheng Hou, Yining Sun, Ruochong Jin et al. · Tsinghua University · Peng Cheng Laboratory +1 more

Visual-only jailbreak attack on image editing VLMs encodes malicious instructions via marks and arrows, achieving 80.9% attack success on commercial models

Prompt Injection visionmultimodalgenerative
PDF Code
benchmark arXiv Jan 27, 2026 · 10w ago

Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection

Yao Xiao, Weiyan Chen, Jiahao Chen et al. · Sun Yat-Sen University · Xi’an Jiaotong University +3 more

Introduces X-AIGD benchmark with pixel-level perceptual artifact annotations to enable interpretable AI-generated image detection evaluation

Output Integrity Attack vision
PDF Code
defense arXiv Jan 5, 2026 · Jan 2026

FMVP: Masked Flow Matching for Adversarial Video Purification

Duoxun Tang, Xueyi Zhang, Chak Hin Wang et al. · Tsinghua University · The Chinese University of Hong Kong +2 more

Defends video recognition models against PGD and CW attacks via flow-matching purification with masking and frequency-gated loss

Input Manipulation Attack vision
PDF
defense arXiv Jan 1, 2026 · Jan 2026

ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching

Yi Sun, Xinhao Zhong, Hongyan Li et al. · Harbin Institute of Technology · Peng Cheng Laboratory +1 more

Training-free activation patching erases unsafe concepts from diffusion models, achieving SOTA safety with adversarial robustness

Output Integrity Attack visiongenerative
1 citations PDF
defense arXiv Nov 30, 2025 · Nov 2025

WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models

Yukang Lin, Jiahao Shao, Shuoran Jiang et al. · Harbin Institute of Technology · Peng Cheng Laboratory

Search-based LLM watermarking framework that improves text quality by 51% over baselines while maintaining robust detectability

Output Integrity Attack nlp
PDF Code
defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative
1 citations PDF
defense arXiv Nov 13, 2025 · Nov 2025

Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification

Yuhang Zhou, Yanxiang Zhao, Zhongyun Hua et al. · Harbin Institute of Technology · Chongqing University of Technology +2 more

Proposes novel adversarial training defense for person ReID metric learning via debiased resampling and self-meta generalization across unseen attacks

Input Manipulation Attack vision
PDF Code
attack arXiv Oct 21, 2025 · Oct 2025

FeatureFool: Zero-Query Fooling of Video Models via Feature Map

Duoxun Tang, Xi Xiao, Guangwu Hu et al. · Tsinghua University · Shenzhen University of Information Technology +4 more

Zero-query black-box adversarial video attack using guided backpropagation feature maps to fool classifiers and bypass Video-LLM harmful content detection

Input Manipulation Attack Prompt Injection visionmultimodal
1 citations PDF
attack arXiv Oct 6, 2025 · Oct 2025

Imperceptible Jailbreaking against Large Language Models

Kuofeng Gao, Yiming Li, Chao Du et al. · Tsinghua University · Sea AI Lab +3 more

Jailbreaks aligned LLMs using invisible Unicode variation selectors as adversarial suffixes, bypassing safety alignment with zero visible text modifications

Prompt Injection nlp
3 citations PDF Code
defense arXiv Sep 2, 2025 · Sep 2025

MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds

Junxi Wu, Jinpeng Wang, Zheng Liu et al. · Nankai University · Tsinghua University +3 more

Novel mixture-of-experts detector for AI-generated text using stylistic modeling and uncertainty-aware conditional thresholds

Output Integrity Attack nlp
PDF Code
attack arXiv Aug 10, 2025 · Aug 2025

Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries

Wenqiang Wang, Yan Xiao, Hao Lin et al. · Sun Yat-Sen University · Peng Cheng Laboratory +1 more

Black-box multi-task adversarial text attack using substitute model transfer, succeeding in ~100 queries across translation, classification, and image generation models

Input Manipulation Attack nlpmultimodal
PDF
attack arXiv Aug 7, 2025 · Aug 2025

Physical Adversarial Camouflage through Gradient Calibration and Regularization

Jiawei Liang, Siyuan Liang, Jianjie Huang et al. · Sun Yat-Sen University · Peng Cheng Laboratory +2 more

Physical adversarial camouflage attack on object detectors using gradient calibration and decorrelation for multi-angle, multi-distance robustness

Input Manipulation Attack vision
PDF
attack arXiv Jan 2, 2025 · Jan 2025

Transferability of Adversarial Attacks in Video-based MLLMs: A Cross-modal Image-to-Video Approach

Linhao Huang, Xue Jiang, Zhiqiang Wang et al. · Tsinghua University · Peng Cheng Laboratory +4 more

Black-box adversarial attack transfers from image surrogate models to video MLLMs via spatiotemporal perturbation propagation

Input Manipulation Attack visionmultimodalnlp
6 citations PDF
survey Journal of Network and Compute... Jan 1, 2025 · Jan 2025

A Survey of Secure Semantic Communications

Rui Meng, Song Gao, Dayu Fan et al. · Beijing University of Posts and Telecommunications · Peng Cheng Laboratory +1 more

Surveys ML security threats and defenses across AI-based semantic communication system lifecycle for 6G networks

Input Manipulation Attack Data Poisoning Attack Model Poisoning nlpvisionmultimodal
27 citations PDF