Latest papers

1,905 papers
defense arXiv Apr 30, 2026 · 21d ago

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan, Yihao Zhang, Pengcheng Su et al. · Peking University

Privacy-preserving watermark detection protocol using VOPRF that verifies LLM-generated text without revealing content to provider

Output Integrity Attack nlp
PDF
defense arXiv Apr 30, 2026 · 21d ago

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Haocheng Huang, Yuchen Chen, Weisong Sun et al. · Soochow University · Nanjing University +1 more

Dataset watermarking scheme embedding stealth marks in code via variable name patterns to prove training data ownership

Output Integrity Attack nlp
PDF
defense arXiv Apr 30, 2026 · 21d ago

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang et al. · Johns Hopkins University · Microsoft Research Asia +2 more

Dual-encoder defense that clusters fragmented malicious prompts across anonymous LLM requests using asymmetric contrastive learning

Prompt Injection nlp
PDF
defense arXiv Apr 30, 2026 · 21d ago

Secure Cross-Silo Synthetic Genomic Data Generation

Daniil Filienko, Martine De Cock, Sikha Pentyala · University of Washington Tacoma

Privacy-preserving federated synthetic genomic data generation using MPC for input privacy and differential privacy for output privacy

Model Inversion Attack federated-learningtabular
PDF
defense arXiv Apr 30, 2026 · 21d ago

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček et al. · Radboud University · University of Bristol +2 more

Reconfigures MoE LLM safety behavior by steering expert routing at inference time without retraining, defending against jailbreaks

Prompt Injection nlp
PDF
defense arXiv Apr 30, 2026 · 21d ago

AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning

Zehui Tang, Yuchen Liu, Feihu Huang · Nanjing University of Aeronautics and Astronautics · MIIT Key Laboratory of Pattern Analysis and Machine Intelligence

Adaptive aggregation defense for federated learning that dynamically adjusts weights across multiple defense layers to counter Byzantine poisoning attacks

Data Poisoning Attack federated-learning
PDF
defense arXiv Apr 30, 2026 · 21d ago

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa et al. · Instituto de Telecomunicações · Universidade da Beira Interior

Extends deepfake detection with semantic mismatch detection, revealing vulnerabilities when authentic audio-video pairs are semantically inconsistent

Output Integrity Attack multimodalvisionaudio
PDF Code
defense arXiv Apr 30, 2026 · 21d ago

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Shuchang Zhou, Shangkun Wu, Jiwei Wei et al. · University of Electronic Science and Technology of China · Harbin Institute of Technology

Detects AI-generated images by fusing frequency-domain artifacts with semantic features via gated injection and hyperspherical learning

Output Integrity Attack visiongenerative
PDF
defense arXiv Apr 29, 2026 · 22d ago

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Yanyun Wang, Qingqing Ye, Li Liu et al. · Hong Kong Polytechnic University · Hong Kong University of Science and Technology

Adversarial training method that harmonizes clean accuracy and robustness by aligning input perturbations with latent space representations

Input Manipulation Attack vision
PDF
defense arXiv Apr 29, 2026 · 22d ago

Which Face and Whose Identity? Solving the Dual Challenge of Deepfake Proactive Forensics in Multi-Face Scenarios

Lei Zhang, Zhiqing Guo, Dan Ma et al. · Xinjiang University · Hunan University

Embeds identity watermarks in multi-face images to localize deepfake-manipulated regions and trace forged identities in group photos

Output Integrity Attack visionmultimodal
PDF
defense arXiv Apr 29, 2026 · 22d ago

TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection

Ahmed Abdullah, Nikolas Ebert, Oliver Wasenmüller · Mannheim University of Applied Sciences

Benchmarks vision foundation models for AI-generated image detection, achieving 12% accuracy improvement over CLIP with tunable attention pooling

Output Integrity Attack visionmultimodal
PDF
defense arXiv Apr 29, 2026 · 22d ago

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

Yuan Xin, Yixuan Weng, Minjun Zhu et al. · CISPA · Westlake University +3 more

GAN-inspired co-evolutionary framework training attack generators and defenders to protect LLM review systems from hidden prompt injection

Prompt Injection nlp
PDF
defense arXiv Apr 29, 2026 · 22d ago

GIFGuard: Proactive Forensics against Deepfakes in Facial GIFs via Spatiotemporal Watermarking

Shupeng Che, Zhiqing Guo, Changtao Miao et al. · Xinjiang University · Ant Group +1 more

Spatiotemporal watermarking framework embedding robust signals in facial GIFs to verify authenticity and detect deepfake tampering

Output Integrity Attack visionmultimodal
PDF
defense arXiv Apr 29, 2026 · 22d ago

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang · Van Lang University

Stateful behavioral firewall for LLM agents using compiled benign traces to block context-sequential tool-call attacks

Insecure Plugin Design Excessive Agency nlp
PDF
defense arXiv Apr 29, 2026 · 22d ago

SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation

Mahshid Rezakhani, Nowfel Mashnoor, Kimia Azar et al. · University of Central Florida

Dual-layer filtering framework detecting poisoned training data in LLM RTL generation via GNN structural analysis and semantic prompt verification

Data Poisoning Attack Model Poisoning Training Data Poisoning nlpgenerative
PDF
defense arXiv Apr 29, 2026 · 22d ago

Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints

Wasim Ahmad, Wei Zhang, Xuerui Mao · Beijing Institute of Technology

Multimodal deepfake detector that jointly learns attribution and detection by aligning generator-specific fingerprints across audio and video

Output Integrity Attack multimodalvisionaudio
PDF
defense arXiv Apr 28, 2026 · 23d ago

Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles

Minh-Khoa Le-Phan, Minh-Hoang Le, Trong-Le Do et al. · University of Science

Multi-stream deepfake detector using DINOv2 and CLIP that maintains robust attention under compression and blur degradations

Output Integrity Attack visionmultimodal
PDF Code
defense arXiv Apr 28, 2026 · 23d ago

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · University of Science and Technology of China +2 more

Lightweight detector that identifies prompt injection attacks in web agent screenshots using visual gradient analysis and text recovery

Prompt Injection Excessive Agency multimodalnlp
PDF
defense arXiv Apr 28, 2026 · 23d ago

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

Jaskirat Sudan, Hashim Ali, Surya Subramani et al. · University of Michigan

Optimizes supervised contrastive learning for audio deepfake detection, achieving 8.29% EER on in-the-wild data via angular similarity

Output Integrity Attack audio
PDF
defense arXiv Apr 28, 2026 · 23d ago

Adversarial Robustness of NTK Neural Networks

Yuxuan Hou · Qiuzhen College · Tsinghua University

Proves NTK neural networks achieve minimax optimal adversarial robustness with early stopping but fail catastrophically when overfitted

Input Manipulation Attack tabular
PDF
Loading more papers…