Zhiyuan Xu

h-index: 1 16 citations 2 papers (total)

Papers in Database (1)

attack arXiv Nov 21, 2025 · Nov 2025

Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models

Zhiyuan Xu, Stanislav Abaimov, Joseph Gardiner et al. · University of Bristol

Novel activation-space attack exploits compression valleys in LLMs to steer behavior toward harmful outputs while evading conventional input/weight audits

Input Manipulation Attack Prompt Injection nlp
PDF