Latest papers

3 papers
defense arXiv Mar 31, 2026 · 6d ago

Robust Multimodal Safety via Conditional Decoding

Anurag Kumar, Raghuveer Peri, Jon Burnsky et al. · The Ohio State University · AWS

Conditional decoding defense using internal safety classification that blocks multimodal jailbreaks across text, image, and audio inputs

Input Manipulation Attack Prompt Injection multimodalnlpvisionaudio
PDF
benchmark arXiv Dec 31, 2025 · Dec 2025

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt, Adrian Wood, Idan Habler et al. · OWASP · Amazon +3 more

Adapts Go-Explore to red-team LLM tool-using agents, finding seed variance (8x spread) dominates algorithmic choice in prompt injection discovery

Prompt Injection Excessive Agency nlp
PDF Code
defense SSRN Oct 8, 2025 · Oct 2025

A2AS: Agentic AI Runtime Security and Self-Defense

Eugene Neelou, Ivan Novikov, Max Moroz et al. · A2AS · OWASP +10 more

Proposes A2AS runtime security framework for LLM agents enforcing prompt authentication, behavior boundaries, and in-context defenses

Prompt Injection Excessive Agency nlp
3 citations PDF