Milad Nasr

benchmark arXiv Oct 10, 2025 · Oct 2025

Milad Nasr, Nicholas Carlini, Chawin Sitawarin et al. · OpenAI · Anthropic +6 more

Adaptive attacks via gradient descent, RL, and random search bypass 12 LLM jailbreak/prompt-injection defenses with >90% success rate

Input Manipulation Attack Prompt Injection nlp

34 citations 4 influentialPDF

attack CCS Oct 2, 2025 · Oct 2025

Milad Nasr, Yanick Fratantonio, Luca Invernizzi et al. · Google DeepMind · OpenAI +2 more

Adversarial 13-byte modification evades Gmail's ML file-type routing model, bypassing the entire production malware detection pipeline

Input Manipulation Attack nlp

1 citations PDF

defense arXiv Nov 12, 2025 · Nov 2025

Tiansheng Huang, Virat Shejwalkar, Oscar Chang et al. · Georgia Institute of Technology · Google DeepMind +1 more

Defends audio language models against representation-drift-based audio jailbreaks using robust reasoning training

Input Manipulation Attack Prompt Injection audionlp

attack arXiv Jan 27, 2026 · 10w ago

Harsh Chaudhari, Ethan Rathbun, Hanna Foerster et al. · Northeastern University · University of Cambridge +4 more

Poisons LLM CoT training data by corrupting reasoning traces to inject targeted behaviors into unseen domains without altering queries or answers

Data Poisoning Attack Training Data Poisoning nlp

Papers in Database (4)