ML Security Papers

Latest papers

10 papers

defense arXiv Mar 22, 2026 · 8w ago

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Octavian Untila · Aisophical SRL

Autonomous AI system independently discovers SMT-based formal verification for AI safety across six domains with 100% accuracy

Output Integrity Attack Insecure Plugin Design Excessive Agency Vulnerability Discovery Patch & Remediation nlpmultimodal

PDF

attack arXiv Mar 19, 2026 · 9w ago

Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents

Toan Tran, Olivera Kotevska, Li Xiong · Emory University · Oak Ridge National Laboratory

LLM-agent framework that automatically discovers novel membership inference attack strategies, achieving 0.18 AUC improvement over existing MIAs

Membership Inference Attack Vulnerability Discovery Red-Team Agents

PDF

attack arXiv Mar 10, 2026 · 10w ago

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

Nanzi Yang, Weiheng Bai, Kangjie Lu · University of Minnesota

Systematically exploits MCP SDK non-compliance vulnerabilities to launch silent prompt injection and DoS attacks against LLM agents

Insecure Plugin Design Prompt Injection Vulnerability Discovery nlp

PDF

benchmark arXiv Feb 18, 2026 · Feb 2026

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Scott Thornton · Perfecxion.ai

Benchmark study finds adversarial code comments fail to meaningfully fool LLM vulnerability detectors across eight frontier models in 14,012 trials

Prompt Injection Vulnerability Discovery Benchmarks & Evaluation nlp

PDF

attack arXiv Jan 30, 2026 · Jan 2026

Semantics-Preserving Evasion of LLM Vulnerability Detectors

Luze Sun, Alina Oprea, Eric Wong · Northeastern University · University of Pennsylvania

Carrier-constrained GCG attacks evade LLM-based code vulnerability detectors using behavior-preserving code transformations that transfer to black-box APIs

Input Manipulation Attack Vulnerability Discovery Benchmarks & Evaluation nlp

PDF Code

survey arXiv Dec 20, 2025 · Dec 2025

SoK: Understanding (New) Security Issues Across AI4Code Use Cases

Qilong Wu, Taoran Li, Tianyang Zhou et al. · University of Illinois Urbana-Champaign

SoK survey spanning adversarial robustness of vulnerability detectors, insecure LLM code generation, and security gaps in AI4Code benchmarks

Input Manipulation Attack Prompt Injection Vulnerability Discovery Benchmarks & Evaluation nlp

1 citations PDF

defense arXiv Oct 20, 2025 · Oct 2025

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

Chengquan Guo, Yuzhou Nie, Chulin Xie et al. · University of Chicago · UC Santa Barbara +3 more

Blue teaming agent for CodeGen LLMs using automated red teaming to detect malicious instructions and vulnerable code outputs

Prompt Injection Blue-Team Agents Vulnerability Discovery Red-Team Agents nlp

PDF

attack arXiv Sep 29, 2025 · Sep 2025

Takedown: How It's Done in Modern Coding Agent Exploits

Eunkyu Lee, Donghyeon Kim, Wonyoung Kim et al. · KAIST

Exploits 15 vulnerabilities in 8 real coding agents via insecure tool design, achieving command execution and data exfiltration without user interaction

Insecure Plugin Design Excessive Agency Red-Team Agents Vulnerability Discovery nlp

3 citations PDF

attack NDSS Aug 24, 2025 · Aug 2025

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

Shir Bernstein, David Beste, Daniel Ayzenshteyn et al. · Ben-Gurion University of the Negev · CISPA Helmholtz Center for Information Security

Adversarial code inputs exploiting LLM pattern-recognition bias to hijack static analysis and hide bugs from code-reviewing LLMs

Input Manipulation Attack Prompt Injection Vulnerability Discovery Benchmarks & Evaluation nlp

PDF

tool arXiv Aug 5, 2025 · Aug 2025

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu, Guangyu Shen, Zian Su et al. · Purdue University

Automated knowledge-graph-guided red-teaming agent finds 11–66% more safety violations in AI coding assistants than prior tools

Prompt Injection Red-Team Agents Vulnerability Discovery nlp

PDF

Latest papers

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains

Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Semantics-Preserving Evasion of LLM Vulnerability Detectors

SoK: Understanding (New) Security Issues Across AI4Code Use Cases

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

Takedown: How It's Done in Modern Coding Agent Exploits

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue