benchmark 2025

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Lewen Yan , Jilin Mei , Tianyi Zhou , Lige Huang , Jie Zhang , Dongrui Liu , Jing Shao

Shanghai AI Laboratory

1 citations · 29 references · arXiv

Published on arXiv

2512.02261

Excessive Agency

OWASP LLM Top 10 — LLM08

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Small perturbations at any single component of an LLM trading agent propagate through the decision loop, causing extreme portfolio concentration, runaway exposure, and large drawdowns across both adaptive and procedural agent architectures.

TradeTrap

Novel technique introduced

LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.

Key Contributions

TradeTrap: a unified adversarial evaluation framework targeting four components of LLM trading agents (market intelligence, strategy formulation, portfolio/ledger handling, trade execution)
Demonstrates that small system-level perturbations at a single component cascade through the agent decision loop, inducing extreme concentration, runaway exposure, and large portfolio drawdowns
Closed-loop historical backtesting on real U.S. equity data with identical initial conditions, enabling reproducible cross-agent comparisons across both adaptive and procedural agent types

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timedigitaltargeted

Datasets

US equity market data

Applications

autonomous trading agentsfinancial market ai systems

Read PDF arXiv DOI Code

TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System

Reliable Weak-to-Strong Monitoring of LLM Agents

PEAR: Planner-Executor Agent Robustness Benchmark

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism