Achieves 84% mean aggregate breakthrough rate (83-86% range) across 5 production LLM platforms with platform-level rates stable within 17 percentage points across 3 independent runs

LAAF

Novel technique introduced

Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated Attack Framework), the first automated red-teaming framework to combine an LPCI-specific technique taxonomy with stage-sequential seed escalation - two capabilities absent from existing tools: Garak lacks memory-persistence and cross-session triggering; PyRIT supports multi-turn testing but treats turns independently, without seeding each stage from the prior breakthrough. LAAF provides: (i) a 49-technique taxonomy spanning six attack categories (Encoding~11, Structural~8, Semantic~8, Layered~5, Trigger~12, Exfiltration~5; see Table 1), combinable across 5 variants per technique and 6 lifecycle stages, yielding a theoretical maximum of 2,822,400 unique payloads ($49 \times 5 \times 1{,}920 \times 6$; SHA-256 deduplicated at generation time); and (ii) a Persistent Stage Breaker (PSB) that drives payload mutation stage-by-stage: on each breakthrough, the PSB seeds the next stage with a mutated form of the winning payload, mirroring real adversarial escalation. Evaluation on five production LLM platforms across three independent runs demonstrates that LAAF achieves higher stage-breakthrough efficiency than single-technique random testing, with a mean aggregate breakthrough rate of 84\% (range 83--86\%) and platform-level rates stable within 17 percentage points across runs. Layered combinations and semantic reframing are the highest-effectiveness technique categories, with layered payloads outperforming encoding on well-defended platforms.

Key Contributions

49-technique LPCI attack taxonomy spanning 6 categories (Encoding, Structural, Semantic, Layered, Trigger, Exfiltration) generating up to 2.8M unique payloads
Persistent Stage Breaker (PSB) mechanism that seeds each attack stage with mutated winning payloads from previous breakthroughs, modeling real adversarial escalation
Automated red-teaming framework achieving 84% mean breakthrough rate across 5 production platforms, demonstrating layered and semantic techniques outperform encoding on defended systems

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

agentic ai systemsrag pipelinesllm memory systems

Read PDF arXiv

LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors

ClawTrap: A MITM-Based Red-Teaming Framework for Real-World OpenClaw Security Evaluation

DREAM: Dynamic Red-teaming across Environments for AI Models

AgentSight: System-Level Observability for AI Agents Using eBPF

AJAR: Adaptive Jailbreak Architecture for Red-teaming

When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents