Latest papers

29 papers
attack arXiv Apr 23, 2026 · 28d ago

Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

Guilin Deng, Silong Chen, Yuchuan Luo et al. · National University of Defense Technology · City University of Hong Kong +1 more

Gradient-based membership inference attack on federated LLMs achieving near-perfect accuracy via projection residual analysis

Membership Inference Attack nlpfederated-learning
PDF Code
benchmark arXiv Apr 13, 2026 · 5w ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 11, 2026 · 5w ago

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Guijia Zhang, Shu Yang, Xilin Gong et al. · Shenzhen University · King Abdullah University of Science & Technology +2 more

Runtime risk-scoring system for LLM agent tool calls that detects indirect prompt injection attacks before execution

Prompt Injection Insecure Plugin Design Excessive Agency nlp
PDF Code
defense arXiv Apr 2, 2026 · 7w ago

Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Yue Li, Linying Xue, Kaiqing Lin et al. · National Huaqiao University · Shenzhen University +2 more

Diffusion-guided adversarial perturbation defense protecting facial images from deepfake manipulation in both white-box and black-box settings

Input Manipulation Attack visiongenerative
PDF
tool arXiv Mar 31, 2026 · 7w ago

GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection

Yaning Zhang, Linlin Shen, Zitong Yu et al. · Qilu University of Technology · Shenzhen University +2 more

Deepfake detector using gaze patterns and CLIP-based vision-language matching to attribute and detect GAN/diffusion-generated faces

Output Integrity Attack visionmultimodal
PDF
defense arXiv Mar 27, 2026 · 7w ago

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

Yi Zhang, Hongbo Huang, Liang-Jie Zhang · Shenzhen University

Embeds bit-exact watermarks in diffusion model noise using error-correction codes for lossless AI image authentication and copyright tracking

Output Integrity Attack visiongenerative
PDF Code
tool arXiv Mar 24, 2026 · 8w ago

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Yangxin Yu, Yue Zhou, Bin Li et al. · Shenzhen University · Sun Yat-Sen University +1 more

LLM-guided fusion framework that combines multiple forensic detectors to identify AI-generated images with explainable verdicts

Output Integrity Attack visionmultimodalnlp
PDF
attack arXiv Mar 20, 2026 · 8w ago

Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Wenjing Hong, Zhonghua Rong, Li Wang et al. · Shenzhen University · Ltd +2 more

Automated multi-objective evolutionary search framework discovering diverse long-tail jailbreak attacks via encryption-decryption prompt transformations

Prompt Injection nlp
PDF
defense arXiv Mar 12, 2026 · 10w ago

ForensicZip: More Tokens are Better but Not Necessary in Forensic Vision-Language Models

Yingxin Lai, Zitong Yu, Jun Wang et al. · Great Bay University · Shenzhen University +2 more

Forensic-aware visual token pruning for deepfake/AIGC detection VLMs using Birth-Death Optimal Transport to preserve manipulation traces

Output Integrity Attack visionmultimodalnlp
PDF Code
attack arXiv Feb 6, 2026 · Feb 2026

Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance

Haipeng Li, Rongxuan Peng, Anwei Luo et al. · Shenzhen University · Nanyang Technological University +2 more

Adversarial perturbations that evade AI-generated content detectors by manipulating shared CLIP embeddings toward authentic anchors

Input Manipulation Attack Output Integrity Attack visionmultimodal
PDF
defense arXiv Feb 2, 2026 · Feb 2026

MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection

Ruiqi Liu, Manni Cui, Ziheng Qin et al. · Institute of Automation · School of Advanced Interdisciplinary Sciences +7 more

Detects AI-generated images by projecting inputs to a real-image manifold and using reconstruction residuals as forgery signals, surpassing human experts

Output Integrity Attack visiongenerative
PDF Code
attack arXiv Feb 2, 2026 · Feb 2026

MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation

Xiaoxi Kong, Jieyu Yuan, Pengdi Chen et al. · Shenzhen University · Nankai University

Removes semantic AI-image watermarks via micro-geometric perturbations that break phase alignment without semantic drift

Output Integrity Attack visiongenerative
PDF
defense arXiv Feb 2, 2026 · Feb 2026

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models

Yue Zhou, Xinan He, Kaiqing Lin et al. · Shenzhen University · NanChang University +1 more

Linear classifiers on frozen Vision Foundation Models outperform specialized AIGI detectors by 30%+ in realistic in-the-wild scenarios

Output Integrity Attack vision
PDF
defense arXiv Jan 29, 2026 · Jan 2026

MPF-Net: Exposing High-Fidelity AI-Generated Video Forgeries via Hierarchical Manifold Deviation and Micro-Temporal Fluctuations

Xinan He, Kaiqing Lin, Yue Zhou et al. · NanChang University · Shenzhen University +3 more

Detects AI-generated video forgeries via hierarchical dual-path analysis of manifold deviations and structured inter-frame residual fingerprints

Output Integrity Attack vision
PDF
defense arXiv Dec 7, 2025 · Dec 2025

AlignGemini: Generalizable AI-Generated Image Detection Through Task-Model Alignment

Ruoxin Chen, Jiahui Gao, Kaiqing Lin et al. · Tencent · East China University of Science and Technology +2 more

Proposes task-model alignment combining VLMs and vision models for generalizable AI-generated image detection

Output Integrity Attack visionmultimodal
PDF
defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative
1 citations PDF
defense arXiv Nov 13, 2025 · Nov 2025

Fairness-Aware Deepfake Detection: Leveraging Dual-Mechanism Optimization

Feng Ding, Wenhui Yi, Yunpeng Zhou et al. · NanChang University · Shenzhen University +1 more

Fairness-aware deepfake detector using channel decoupling and distribution alignment to reduce demographic bias without sacrificing accuracy

Output Integrity Attack vision
PDF
attack arXiv Nov 11, 2025 · Nov 2025

Why does weak-OOD help? A Further Step Towards Understanding Jailbreaking VLMs

Yuxuan Zhou, Yuzhao Peng, Yang Bai et al. · Tsinghua University · ByteDance +4 more

Analyzes why mild OOD image manipulation best jailbreaks VLMs, then proposes JOCR, an OCR-based visual attack outperforming SOTA baselines

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF
attack arXiv Nov 10, 2025 · Nov 2025

JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework

Yuxuan Zhou, Yang Bai, Kuofeng Gao et al. · Tsinghua University · ByteDance +1 more

Multi-agent framework automates black-box jailbreaking of VLMs via coordinated image-text pair generation, achieving 60%+ ASR on GPT-4o

Prompt Injection multimodalnlp
PDF
defense arXiv Nov 10, 2025 · Nov 2025

Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation

Yuxuan Zhou, Tao Yu, Wen Huang et al. · Tsinghua University · CASIA +1 more

Trains deepfake detectors with RL-adaptive curriculum augmentation and causal inference to generalize across unseen forgery domains

Output Integrity Attack vision
PDF
Loading more papers…