TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
Lewen Yan , Jilin Mei , Tianyi Zhou , Lige Huang , Jie Zhang , Dongrui Liu , Jing Shao
Published on arXiv
2512.02261
Excessive Agency
OWASP LLM Top 10 — LLM08
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Small perturbations at any single component of an LLM trading agent propagate through the decision loop, causing extreme portfolio concentration, runaway exposure, and large drawdowns across both adaptive and procedural agent architectures.
TradeTrap
Novel technique introduced
LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.
Key Contributions
- TradeTrap: a unified adversarial evaluation framework targeting four components of LLM trading agents (market intelligence, strategy formulation, portfolio/ledger handling, trade execution)
- Demonstrates that small system-level perturbations at a single component cascade through the agent decision loop, inducing extreme concentration, runaway exposure, and large portfolio drawdowns
- Closed-loop historical backtesting on real U.S. equity data with identical initial conditions, enabling reproducible cross-agent comparisons across both adaptive and procedural agent types