Milad Nasr

h-index: 31 12,930 citations 79 papers (total)

Papers in Database (4)

benchmark arXiv Oct 10, 2025 · Oct 2025

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin et al. · OpenAI · Anthropic +6 more

Adaptive attacks via gradient descent, RL, and random search bypass 12 LLM jailbreak/prompt-injection defenses with >90% success rate

Input Manipulation Attack Prompt Injection nlp
34 citations 4 influentialPDF
attack CCS Oct 2, 2025 · Oct 2025

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Milad Nasr, Yanick Fratantonio, Luca Invernizzi et al. · Google DeepMind · OpenAI +2 more

Adversarial 13-byte modification evades Gmail's ML file-type routing model, bypassing the entire production malware detection pipeline

Input Manipulation Attack nlp
1 citations PDF
defense arXiv Nov 12, 2025 · Nov 2025

Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models

Tiansheng Huang, Virat Shejwalkar, Oscar Chang et al. · Georgia Institute of Technology · Google DeepMind +1 more

Defends audio language models against representation-drift-based audio jailbreaks using robust reasoning training

Input Manipulation Attack Prompt Injection audionlp
PDF
attack arXiv Jan 27, 2026 · 10w ago

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Harsh Chaudhari, Ethan Rathbun, Hanna Foerster et al. · Northeastern University · University of Cambridge +4 more

Poisons LLM CoT training data by corrupting reasoning traces to inject targeted behaviors into unseen domains without altering queries or answers

Data Poisoning Attack Training Data Poisoning nlp
PDF