Latest papers

3,803 papers
benchmark arXiv Apr 13, 2026 · 2d ago

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya et al. · Lomonosov Moscow State University · Shenzhen University +14 more

Competition report on robust deepfake detection across 42 generators and 36 image transformations with 20 final solutions

Output Integrity Attack visiongenerative
PDF
attack arXiv Apr 13, 2026 · 2d ago

Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

Shuhao Zhang, Yuli Chen, Jiale Han et al. · Beijing University of Posts and Telecommunications · Hong Kong University of Science and Technology

Adaptive attack that steals LLM text watermarks by dynamically selecting optimal attack perspectives during generation

Output Integrity Attack nlp
PDF Code
defense arXiv Apr 13, 2026 · 2d ago

QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits

Navid Azimi, Aditya Prakash, Yao Wang et al. · Emory University

Hybrid quantum-classical CNN defense using quantum entanglement circuits to reduce adversarial attack success rates on image classifiers

Input Manipulation Attack vision
PDF
defense arXiv Apr 13, 2026 · 2d ago

LRD-Net: A Lightweight Real-Centered Detection Network for Cross-Domain Face Forgery Detection

Xuecen Zhang, Vipin Chaudhary · Case Western Reserve University

Lightweight deepfake detector using frequency-guided architecture and real-centered learning for cross-domain face forgery detection on mobile devices

Output Integrity Attack visionmultimodal
PDF
attack arXiv Apr 13, 2026 · 2d ago

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Yihao Zhang, Kai Wang, Jiangrong Wu et al. · Peking University · Sun Yat-Sen University +4 more

Multi-turn jailbreak attack that chains low-risk prompts to cumulatively bypass LLM safety guardrails across modalities

Prompt Injection nlpmultimodal
PDF
benchmark arXiv Apr 13, 2026 · 2d ago

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Chenxi Qing, Junxi Wu, Zheng Liu et al. · Tsinghua University · Nankai University +2 more

Chinese benchmark for AI-generated text detection with real-world prompts across nine LLMs and multiple domains

Output Integrity Attack nlp
PDF Code
defense arXiv Apr 13, 2026 · 2d ago

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao, Zhe Li, Peixin Zhang et al. · Singapore Management University

Runtime framework enforcing user-confirmed rules at tool-call boundaries to block indirect prompt injection across web, MCP, and skill channels

Prompt Injection Insecure Plugin Design nlp
PDF Code
attack arXiv Apr 13, 2026 · 2d ago

On the Robustness of Watermarking for Autoregressive Image Generation

Andreas Müller, Denis Lukovnikov, Shingo Kodama et al. · Ruhr University Bochum · Middlebury College +3 more

Attacks watermarking schemes for autoregressive image generators, achieving both removal and forgery with single reference images

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 13, 2026 · 2d ago

Finetune Like You Pretrain: Boosting Zero-shot Adversarial Robustness in Vision-language Models

Songlong Xing, Weijie Wang, Zhengyu Zhao et al. · University of Trento · Fondazione Bruno Kessler +2 more

Adversarial finetuning for CLIP using web image-text pairs and contrastive learning to boost robustness across 14 domains

Input Manipulation Attack visionnlpmultimodal
PDF Code
defense arXiv Apr 13, 2026 · 2d ago

Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

Ajinkya Mohgaonkar, Lukas Gosch, Mahalakshmi Sabanayagam et al. · Technical University of Munich · Munich Data Science Institute +2 more

Certifies neural network robustness against label-flipping poisoning attacks using white-box partition-aggregation ensembles and neural tangent kernels

Data Poisoning Attack vision
PDF
defense arXiv Apr 13, 2026 · 2d ago

Learning Robustness at Test-Time from a Non-Robust Teacher

Stefano Bianchettin, Giulio Rossolini, Giorgio Buttazzo · Sant’Anna School of Advanced Study

Test-time adaptation framework that learns adversarial robustness from non-robust pretrained models using unlabeled target data

Input Manipulation Attack vision
PDF
attack arXiv Apr 13, 2026 · 2d ago

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

Hanbo Huang, Xuan Gong, Yiran Zhang et al. · Shanghai Jiao Tong University

RL-based black-box attack that spoofs LLM watermarks with 62% success using only 100 training pairs, no detector access

Output Integrity Attack nlp
PDF
defense arXiv Apr 13, 2026 · 2d ago

Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service

Zhimin Chen, Xiaojie Liang, Wenbo Xu et al. · Sun Yat-Sen University

Embeds geometry-aware watermarks in embedding model outputs to prove ownership and detect stolen EaaS models

Model Theft nlpmultimodal
PDF
attack arXiv Apr 12, 2026 · 3d ago

Membership Inference Attacks Expose Participation Privacy in ECG Foundation Encoders

Ziyu Wang, Elahe Khatibi, Ankita Sharma et al. · University of California · Arizona State University +1 more

Audits membership inference attacks on ECG foundation encoders, finding participation leakage through embeddings and scores under realistic access models

Membership Inference Attack timeseries
PDF
benchmark arXiv Apr 12, 2026 · 3d ago

SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits

Mengieong Hoi, Zhedong Zheng, Ping Liu et al. · University of Macau · University of Nevada +1 more

Benchmark and detection method for tracing multi-step diffusion-based deepfake facial edits using frequency-aware analysis

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 12, 2026 · 3d ago

Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

Yuanbo Xie, Yingjie Zhang, Yulin Li et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +4 more

Runtime defense that embeds canary tokens in RAG-retrieved content to detect knowledge base leakage attacks in real-time

Sensitive Information Disclosure Prompt Injection nlp
PDF
defense arXiv Apr 12, 2026 · 3d ago

Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs

Eric Easley, Sebastian Farquhar · University of California · University of Oxford

Defense training LLMs to reinterpret malicious instructions as benign at the representation level, blocking jailbreaks and backdoors

Model Poisoning Prompt Injection Sensitive Information Disclosure nlp
PDF
defense arXiv Apr 12, 2026 · 3d ago

DuCodeMark: Dual-Purpose Code Dataset Watermarking via Style-Aware Watermark-Poison Design

Yuchen Chen, Yuan Xiao, Chunrong Fang et al. · Nanjing University

Embeds ownership watermarks in code training datasets using AST-based style triggers plus poisoned samples that degrade model performance if watermark is removed

Output Integrity Attack Model Poisoning nlp
PDF
defense arXiv Apr 12, 2026 · 3d ago

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

Zeqian Long, Ozgur Kara, Haotian Xue et al. · University of Illinois Urbana-Champaign · Georgia Institute of Technology

Adversarial immunization that corrupts image-to-video generation by enforcing temporal latent divergence and trajectory misalignment across frames

Input Manipulation Attack visionmultimodalgenerative
PDF Code
defense arXiv Apr 12, 2026 · 3d ago

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Vu Tuan Truong, Long Bao Le · University of Quebec

Two-stage fine-tuning defense teaching LLMs critical thinking to detect and refuse malicious reasoning steps in backdoor attacks

Model Poisoning nlp
PDF Code
Loading more papers…