α

Published on arXiv

2603.10042

Model Poisoning

OWASP ML Top 10 — ML10

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Flip-Agent significantly outperforms existing targeted BFAs on real-world LLM agent tasks, demonstrating that multi-stage agent pipelines with external tools create exploitable attack surfaces beyond those of single-step inference models.

Flip-Agent

Novel technique introduced


Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.


Key Contributions

  • First targeted bit-flip attack framework (Flip-Agent) for LLM-based agents, extending BFAs beyond single-step classifiers to multi-stage agent pipelines
  • Novel attack surface analysis identifying vulnerabilities in both final output generation and tool invocation steps of LLM agents
  • Empirical demonstration that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks

🛡️ Threat Analysis

Model Poisoning

Bit-flip attacks directly corrupt model weight parameters (via hardware fault exploitation) to induce targeted malicious behavior in LLM agents — this is direct weight-level model poisoning/trojaning, analogous to backdoor injection but executed at inference time via DRAM-level bit manipulation rather than training-time data poisoning.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_timetargeted
Applications
llm-based agentstool invocationmulti-stage agent pipelines