attack 2026

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Qianlong Lan , Anuj Kaul , Shaun Jones , Stephanie Westrum

eBay

0 citations

Published on arXiv

2602.22450

Prompt Injection

OWASP LLM Top 10 — LLM01

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Key Finding

Implicit prompt injection via URL metadata achieves P(egress)=0.89 with 95% of attacks undetected by output-based safety checks; sharded exfiltration reduces Leak@1 by 73% while bypassing DLP mechanisms

Sharded Exfiltration

Novel technique introduced

Agentic large language model systems increasingly automate tasks by retrieving URLs and calling external tools. We show that this workflow gives rise to implicit prompt injection: adversarial instructions embedded in automatically generated URL previews, including titles, metadata, and snippets, can introduce a system-level risk that we refer to as silent egress. Using a fully local and reproducible testbed, we demonstrate that a malicious web page can induce an agent to issue outbound requests that exfiltrate sensitive runtime context, even when the final response shown to the user appears harmless. In 480 experimental runs with a qwen2.5:7b-based agent, the attack succeeds with high probability (P (egress) =0.89), and 95% of successful attacks are not detected by output-based safety checks. We also introduce sharded exfiltration, where sensitive information is split across multiple requests to avoid detection. This strategy reduces single-request leakage metrics by 73% (Leak@1) and bypasses simple data loss prevention mechanisms. Our ablation results indicate that defenses applied at the prompt layer offer limited protection, while controls at the system and network layers, such as domain allowlisting and redirect-chain analysis, are considerably more effective. These findings suggest that network egress should be treated as a first-class security outcome in agentic LLM systems. We outline architectural directions, including provenance tracking and capability isolation, that go beyond prompt-level hardening.

Key Contributions

Defines implicit prompt injection as a distinct attack subclass where adversarial instructions enter the LLM agent context through automatic URL preview/metadata extraction — invisible to both the user and output-based safety monitors
Introduces sharded exfiltration, a technique that splits sensitive runtime context across multiple outbound requests, reducing per-request Leak@1 metrics by 73% and bypassing simple data loss prevention mechanisms
Empirically demonstrates P(egress)=0.89 over 480 runs on a qwen2.5:7b agent, with 95% of successful attacks undetected by output-based checks, and shows network/system-layer controls (domain allowlisting, redirect-chain analysis) are substantially more effective than prompt-layer defenses

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

llm agents with url retrievalagentic ai assistantstool-augmented llm systems

Read PDF arXiv

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

CLIOPATRA: Extracting Private Information from LLM Insights

EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System

Bypassing Prompt Guards in Production with Controlled-Release Prompting

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection

Tricking LLM-Based NPCs into Spilling Secrets

External Data Extraction Attacks against Retrieval-Augmented Large Language Models