ML Security Papers

Latest papers

6 papers

defense arXiv Apr 7, 2026 · 6w ago

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala et al. · OWASP · Amazon Web Services +3 more

Proves continuous utility-preserving prompt filters cannot eliminate all LLM jailbreaks due to topological constraints on prompt space

Prompt Injection nlp

PDF Code

tool arXiv Mar 18, 2026 · 9w ago

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta, Ken Huang, Kyriakos Rock Lambros et al. · Qorvex Consulting · Distributedapps.ai +8 more

Automated red-teaming framework for multi-stage prompt injection attacks on agentic LLMs with persistent memory and RAG

Prompt Injection Excessive Agency nlp

PDF

tool arXiv Feb 25, 2026 · 12w ago

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Idan Habler, Vineeth Sai Narajala, Stav Koren et al. · Cisco · OWASP +1 more

Open-source scanner (hubscan) detecting adversarially crafted hub documents injected into RAG vector databases to hijack LLM context

Data Poisoning Attack Prompt Injection nlpmultimodal

PDF Code

benchmark arXiv Dec 31, 2025 · Dec 2025

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt, Adrian Wood, Idan Habler et al. · OWASP · Amazon +3 more

Adapts Go-Explore to red-team LLM tool-using agents, finding seed variance (8x spread) dominates algorithmic choice in prompt injection discovery

Prompt Injection Excessive Agency Red-Team Agents Benchmarks & Evaluation nlp

PDF Code

tool arXiv Dec 29, 2025 · Dec 2025

Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models

Ron F. Del Rosario · SAP · OWASP

Fine-tunes LLMs via QLoRA to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis

Excessive Agency Prompt Injection nlp

PDF

defense SSRN Oct 8, 2025 · Oct 2025

A2AS: Agentic AI Runtime Security and Self-Defense

Eugene Neelou, Ivan Novikov, Max Moroz et al. · A2AS · OWASP +10 more

Proposes A2AS runtime security framework for LLM agents enforcing prompt authentication, behavior boundaries, and in-context defenses

Prompt Injection Excessive Agency nlp

3 citations PDF

Latest papers

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Adversarial Hubness Detector: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models

A2AS: Agentic AI Runtime Security and Self-Defense

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue