ML Security Papers

Latest papers

11 papers

attack arXiv Apr 7, 2026 · 6w ago

Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Hanxi Li, Jianan Zhou, Jiale Lao et al. · Sichuan University · Cornell University +2 more

Poisoning attack injecting malicious vectors near embedding space centroids to dominate retrieval results in vector databases

Data Poisoning Attack Prompt Injection nlpmultimodal

PDF Code

attack arXiv Feb 22, 2026 · 12w ago

Learning to Detect Language Model Training Data via Active Reconstruction

Junjie Oscar Yin, John X. Morris, Vitaly Shmatikov et al. · University of Washington · Cornell University +2 more

Uses reinforcement learning to fine-tune LLMs and detect training data membership via active reconstruction, outperforming passive MIAs by 10.7%

Membership Inference Attack Sensitive Information Disclosure nlp

PDF

benchmark arXiv Feb 21, 2026 · 12w ago

Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language Models

Trishita Tiwari, Ari Trachtenberg, G. Edward Suh · Cornell University · Boston University +1 more

Proposes Prior-Aware Memorization metric showing 55–90% of LLM 'memorized' sequences are actually statistically common, not genuine leakage

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

attack arXiv Jan 29, 2026 · Jan 2026

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Xiaogeng Liu, Xinyan Wang, Yechao Zhang et al. · Johns Hopkins University · NVIDIA +4 more

RL-trained attacker generates short natural prompts that force LRMs into pathologically long reasoning, achieving 286x amplification and >98% detection bypass

Model Denial of Service nlpreinforcement-learning

PDF

defense arXiv Jan 24, 2026 · Jan 2026

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes

Gautam Siddharth Kashyap, Harsh Joshi, Niharika Jain et al. · Macquarie University · Bharati Vidyapeeth’s College Of Engineering +4 more

Proposes ConLLM, a contrastive learning + LLM framework for detecting audio, video, and audio-visual deepfakes

Output Integrity Attack multimodalaudiovisionnlp

PDF Code

tool arXiv Dec 21, 2025 · Dec 2025

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

Zhang Wei, Peilu Hu, Zhenyuan Wei et al. · Independent Researcher · Ltd. +12 more

Automated red-teaming tool for LLMs using meta-prompt-guided adversarial generation, finding 3.9× more vulnerabilities than manual testing

Prompt Injection Red-Team Agents Benchmarks & Evaluation nlp

1 citations PDF

survey IACR ePrint Dec 1, 2025 · Dec 2025

Systems Security Foundations for Agentic Computing

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda et al. · Google · University of California +5 more

Surveys agentic AI security through a systems-security lens, covering prompt injection, tool-use risks, and 11 real-world attack case studies

Prompt Injection Insecure Plugin Design Excessive Agency nlp

3 citations PDF

attack arXiv Nov 27, 2025 · Nov 2025

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Wenkai Huang, Yijia Guo, Gaolei Li et al. · Shanghai Jiao Tong University · Shanghai Key Laboratory of Integrated Administration Technologies for Information Security +4 more

Attacks copyright watermarks on 3D Gaussian Splatting assets by isolating and removing watermark-bearing Gaussian primitives via view-dependent rendering analysis

Output Integrity Attack vision

1 citations PDF

defense TIFS Nov 2, 2025 · Nov 2025

Parameter Interpolation Adversarial Training for Robust Image Classification

Xin Liu, Yichen Yang, Kun He et al. · Huazhong University of Science and Technology · Cornell University

Improves adversarial training stability via parameter interpolation between epochs, reducing overfitting and robustness oscillation in CNNs and ViTs

Input Manipulation Attack vision

9 citations 1 influentialPDF

defense arXiv Oct 20, 2025 · Oct 2025

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Rishi Jha, Harold Triedman, Justin Wagle et al. · Cornell University · Microsoft

Breaks alignment-based defenses for LLM multi-agent control-flow hijacking and proposes ControlValve using control-flow graphs and least privilege

Prompt Injection Excessive Agency nlp

3 citations PDF

benchmark arXiv Sep 6, 2025 · Sep 2025

Benchmarking Robust Aggregation in Decentralized Gradient Marketplaces

Zeyu Song, Sainyam Galhotra, Shagufta Mehnaz · Pennsylvania State University · Cornell University

Benchmarks robust aggregation defenses against Byzantine and Sybil attacks in decentralized federated gradient marketplaces with new economic metrics

Data Poisoning Attack federated-learning

PDF

Latest papers

Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects

Learning to Detect Language Model Training Data via Active Reconstruction

Prior Aware Memorization: An Efficient Metric for Distinguishing Memorization from Generalization in Large Language Models

ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models

Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

Systems Security Foundations for Agentic Computing

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Parameter Interpolation Adversarial Training for Robust Image Classification

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Benchmarking Robust Aggregation in Decentralized Gradient Marketplaces

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue