ML Security Papers

Latest papers

2 papers

attack arXiv Feb 26, 2026 · 5w ago

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia et al. · Nanyang Technological University · BraneMatrix AI +7 more

Bio-inspired optimization generates classical Chinese jailbreak prompts that defeat modern-language safety guardrails in black-box LLMs

Prompt Injection nlp

PDF

attack arXiv Aug 14, 2025 · Aug 2025

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Chiyu Zhang, Lu Zhou, Xiaogang Xu et al. · Nanjing University of Aeronautics and Astronautics · Collaborative Innovation Center of Novel Software Technology and Industrialization +2 more

Novel black-box jailbreak attack combining adversarial context alignment and fake chain-of-thought to bypass reasoning LLM safety guardrails

Prompt Injection nlp

PDF

Latest papers

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue