Latest papers

3 papers
defense arXiv Apr 7, 2026 · 6w ago

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala et al. · OWASP · Amazon Web Services +3 more

Proves continuous utility-preserving prompt filters cannot eliminate all LLM jailbreaks due to topological constraints on prompt space

Prompt Injection nlp
PDF Code
tool arXiv Mar 18, 2026 · 9w ago

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta, Ken Huang, Kyriakos Rock Lambros et al. · Qorvex Consulting · Distributedapps.ai +8 more

Automated red-teaming framework for multi-stage prompt injection attacks on agentic LLMs with persistent memory and RAG

Prompt Injection Excessive Agency nlp
PDF
benchmark arXiv Feb 25, 2026 · 12w ago

Manifold of Failure: Behavioral Attraction Basins in Language Models

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala et al. · Amazon · Cisco +2 more

Maps LLM safety failure topology using quality-diversity optimization to reveal behavioral attraction basins across three frontier models

Prompt Injection nlp
PDF Code