Nicholas Carlini

benchmark arXiv Oct 10, 2025 · Oct 2025

Milad Nasr, Nicholas Carlini, Chawin Sitawarin et al. · OpenAI · Anthropic +6 more

Adaptive attacks via gradient descent, RL, and random search bypass 12 LLM jailbreak/prompt-injection defenses with >90% success rate

Input Manipulation Attack Prompt Injection nlp

34 citations 4 influentialPDF

tool arXiv Oct 17, 2025 · Oct 2025

Sarah Egler, John Schulman, Nicholas Carlini · MATS · Anthropic +1 more

LLM auditing agent detects adversarial fine-tuning attacks, including covert cipher backdoors, before model deployment

Transfer Learning Attack Model Poisoning Prompt Injection nlp

3 citations PDF Code

attack CCS Oct 2, 2025 · Oct 2025

Milad Nasr, Yanick Fratantonio, Luca Invernizzi et al. · Google DeepMind · OpenAI +2 more

Adversarial 13-byte modification evades Gmail's ML file-type routing model, bypassing the entire production malware detection pipeline

Input Manipulation Attack nlp

1 citations PDF

Papers in Database (3)