Attack success rates reach up to 50.0% on state-of-the-art LLMs and remain at least 6.7% even for the most robust models, demonstrating that existing safety mechanisms do not transfer to realistic financial agent settings.

FinVault

Novel technique introduced

Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we propose FinVault, the first execution-grounded security benchmark for financial agents, comprising 31 regulatory case-driven sandbox scenarios with state-writable databases and explicit compliance constraints, together with 107 real-world vulnerabilities and 963 test cases that systematically cover prompt injection, jailbreaking, financially adapted attacks, as well as benign inputs for false-positive evaluation. Experimental results reveal that existing defense mechanisms remain ineffective in realistic financial agent settings, with average attack success rates (ASR) still reaching up to 50.0\% on state-of-the-art models and remaining non-negligible even for the most robust systems (ASR 6.7\%), highlighting the limited transferability of current safety designs and the need for stronger financial-specific defenses. Our code can be found at https://github.com/aifinlab/FinVault.

Key Contributions

FinVault: first execution-grounded security benchmark for financial LLM agents with 31 regulatory case-driven sandbox scenarios featuring state-writable databases and compliance constraints
107 real-world vulnerabilities and 963 test cases spanning prompt injection, jailbreaking, financially adapted attacks, and benign inputs for false-positive evaluation
Empirical evaluation revealing that current defenses remain inadequate, with average ASR up to 50.0% on SOTA models and a floor of 6.7% even for the most robust systems

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

FinVault (proposed)

Applications

financial agentsinvestment analysisrisk assessmentautomated financial decision-making

Read PDF arXiv DOI Code

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

PEAR: Planner-Executor Agent Robustness Benchmark

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

ASTRA: Agentic Steerability and Risk Assessment Framework

When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents