ML Security Papers

Latest papers

1,905 papers

defense arXiv Apr 30, 2026 · 21d ago

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan, Yihao Zhang, Pengcheng Su et al. · Peking University

Privacy-preserving watermark detection protocol using VOPRF that verifies LLM-generated text without revealing content to provider

Output Integrity Attack nlp

PDF

defense arXiv Apr 30, 2026 · 21d ago

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Haocheng Huang, Yuchen Chen, Weisong Sun et al. · Soochow University · Nanjing University +1 more

Dataset watermarking scheme embedding stealth marks in code via variable name patterns to prove training data ownership

Output Integrity Attack nlp

PDF

defense arXiv Apr 30, 2026 · 21d ago

TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

Bowen Sun, Chaozhuo Li, Yaodong Yang et al. · Johns Hopkins University · Microsoft Research Asia +2 more

Dual-encoder defense that clusters fragmented malicious prompts across anonymous LLM requests using asymmetric contrastive learning

Prompt Injection nlp

PDF

defense arXiv Apr 30, 2026 · 21d ago

Secure Cross-Silo Synthetic Genomic Data Generation

Daniil Filienko, Martine De Cock, Sikha Pentyala · University of Washington Tacoma

Privacy-preserving federated synthetic genomic data generation using MPC for input privacy and differential privacy for output privacy

Model Inversion Attack federated-learningtabular

PDF

defense arXiv Apr 30, 2026 · 21d ago

MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks

Jona te Lintelo, Lichao Wu, Marina Krček et al. · Radboud University · University of Bristol +2 more

Reconfigures MoE LLM safety behavior by steering expert routing at inference time without retraining, defending against jailbreaks

Prompt Injection nlp

PDF

defense arXiv Apr 30, 2026 · 21d ago

AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning

Zehui Tang, Yuchen Liu, Feihu Huang · Nanjing University of Aeronautics and Astronautics · MIIT Key Laboratory of Pattern Analysis and Machine Intelligence

Adaptive aggregation defense for federated learning that dynamically adjusts weights across multiple defense layers to counter Byzantine poisoning attacks

Data Poisoning Attack federated-learning

PDF

defense arXiv Apr 30, 2026 · 21d ago

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa et al. · Instituto de Telecomunicações · Universidade da Beira Interior

Extends deepfake detection with semantic mismatch detection, revealing vulnerabilities when authentic audio-video pairs are semantically inconsistent

Output Integrity Attack multimodalvisionaudio

PDF Code

defense arXiv Apr 30, 2026 · 21d ago

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Shuchang Zhou, Shangkun Wu, Jiwei Wei et al. · University of Electronic Science and Technology of China · Harbin Institute of Technology

Detects AI-generated images by fusing frequency-domain artifacts with semantic features via gated injection and hyperspherical learning

Output Integrity Attack visiongenerative

PDF