ML Security Papers

Benchmarks 10 frontier LLMs against CBRN jailbreak prompts, finding Deep Inception attacks bypass safety filters 86% of the time versus 34% for direct requests

attack arXiv Oct 23, 2025 · Oct 2025

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur et al. · Enkrypt AI

Systematic multimodal jailbreak study shows simple image/audio transformations achieve 75–89% ASR on frontier VLMs with near-perfect text safety

Prompt Injection visionaudiomultimodalnlp

PDF

Latest papers

Quantifying CBRN Risk in Frontier Models

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue