David Lindner

h-index: 0 0 citations 0 papers (total)

Papers in Database (1)

attack arXiv Feb 9, 2026 · 8w ago

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley, Ben Marlin et al. · MATS · University of Massachusetts Amherst +1 more

Automated red-team pipeline generates system prompts that fool both black-box and white-box LLM alignment auditing methods via strategic deception

Prompt Injection nlp
PDF Code