benchmark 2026

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

Intrinsec AI

0 citations

Published on arXiv

2604.21131

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Bounded-memory Coreset Memory Reader at K=50 is the only architecture whose attack recall survives both dilution and cross-session rewriting shards, while session-bound and full-log correlators lose ~50% recall on cross-session attacks

Coreset Memory Reader

Novel technique introduced

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts). (2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets. (3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at $K=50$ is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote $\mathrm{CSR\_prefix}$ (ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into $\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}$, benchmarking rankers on a single Pareto of recall versus serving stability.

Key Contributions

CSTM-Bench dataset with 26 attack taxonomies classified by kill-chain stage and cross-session operations (accumulate, compose, launder, inject_on_reader)
Empirical finding that both session-bound judges and full-log correlators lose ~50% attack recall on cross-session-only attacks despite frontier context windows
Coreset Memory Reader architecture retaining top-K=50 signal fragments, achieving highest recall across both compositional and cross-session attack shards
CSTM composite metric fusing detection F1 with CSR_prefix (prefix stability) to benchmark rankers on recall vs. serving-cost Pareto frontier

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

CSTM-Bench (intrinsec-ai/cstm-bench on Hugging Face)

Applications

ai agent securityguardrail systemsagentic workflow protection

Read PDF arXiv

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

PEAR: Planner-Executor Agent Robustness Benchmark

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Mind the Gap: Comparing Model- vs Agentic-Level Red Teaming with Action-Graph Observability on GPT-OSS-20B

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting

WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents