Chawin Sitawarin

h-index: 4 65 citations 9 papers (total)

Papers in Database (3)

benchmark arXiv Oct 10, 2025 · Oct 2025

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections

Milad Nasr, Nicholas Carlini, Chawin Sitawarin et al. · OpenAI · Anthropic +6 more

Adaptive attacks via gradient descent, RL, and random search bypass 12 LLM jailbreak/prompt-injection defenses with >90% success rate

Input Manipulation Attack Prompt Injection nlp
34 citations 4 influentialPDF
attack arXiv Oct 21, 2025 · Oct 2025

Extracting alignment data in open models

Federico Barbero, Xiangming Gu, Christopher A. Choquette-Choo et al. · University of Oxford · National University of Singapore +4 more

Extracts LLM alignment training data via chat template prompting, finding embedding similarity reveals 10x more memorization than string matching

Model Inversion Attack Sensitive Information Disclosure nlp
4 citations PDF
defense arXiv Oct 24, 2025 · Oct 2025

Soft Instruction De-escalation Defense

Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes et al. · CISPA Helmholtz Center for Information Security · Google DeepMind +1 more

Defends LLM agents against indirect prompt injection via iterative sanitization, limiting adversarial attack success rate to 15%

Prompt Injection nlp
2 citations PDF