attack arXiv Mar 7, 2026 · 4w ago
Jialai Wang, Ya Wen, Zhongmou Liu et al. · National University of Singapore · Tsinghua University +1 more
Flip-Agent exploits hardware bit-flips to corrupt LLM agent weights, hijacking tool calls and final outputs in multi-stage pipelines
Model Poisoning Excessive Agency nlp
Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.
llm transformer National University of Singapore · Tsinghua University · Huazhong University of Science and Technology