Aengus Lynch

survey arXiv Mar 31, 2026 · 6d ago

The Persistent Vulnerability of Aligned AI Systems

Aengus Lynch · University College London

Comprehensive AI safety thesis spanning mechanistic interpretability, sleeper agent defenses, jailbreaking frontier models, and autonomous agent misalignment

Input Manipulation Attack Prompt Injection Excessive Agency nlpvisionaudiomultimodal

PDF

Papers in Database (1)

The Persistent Vulnerability of Aligned AI Systems