The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Language agents increasingly act as web-enabled systems that search, browse, and synthesize information from diverse sources. However, these sources can include unreliable or adversarial content, and the robustness of agents to adversarial ranking - where misleading information appears prominently in search results - remains poorly understood. Existing benchmarks evaluate functional navigation or static factuality but cannot causally isolate this vulnerability, and current mitigation strategies for retrieval-augmented generation remain largely untested under such conditions. We introduce Synthetic Web Benchmark, a procedurally generated environment comprising thousands of hyperlinked articles with ground-truth labels for credibility and factuality, process-level interaction traces, and contamination filtering to eliminate training-data leakage. By injecting a single high-plausibility misinformation article into a controllable search rank, we measure the causal effect of adversarial exposure in six frontier models. The results reveal catastrophic failures: accuracy collapses despite unlimited access to truthful sources, with minimal search escalation and severe miscalibration. These findings expose fundamental limitations in how current frontier models handle conflicting information, with immediate implications for deployment in high-stakes domains. Our benchmark enables systematic analysis of these failure modes and provides a controlled testbed for evaluating mitigation strategies under adversarial ranking - a gap in current research. This work establishes a reproducible baseline for developing search-robust and epistemically humble agents capable of resisting manipulation in high-stakes domains.

Key Contributions

Synthetic Web Benchmark: a procedurally generated environment of thousands of hyperlinked articles with ground-truth credibility/factuality labels, contamination filtering to eliminate training-data leakage, and process-level interaction traces
Rank-controlled adversarial exposure protocol that causally measures the effect of injecting a single high-plausibility misinformation article at rank 0 across six frontier models
Diagnosis of three systematic failure modes — minimal search escalation, poor multi-source synthesis, and severe miscalibration — exposing fundamental epistemic weaknesses in current web-enabled language agents

🛡️ Threat Analysis

Input Manipulation Attack

The adversarial ranking attack injects a strategically crafted high-plausibility misinformation article into a search-augmented LLM system — a form of adversarial SEO poisoning / document injection for RAG explicitly listed under ML01's dual-tagging criteria for adversarial content manipulation of LLM-integrated systems.

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

Synthetic Web Benchmark (authors' own procedurally generated corpus)

Applications

2026 0 cit.

Input Manipulation Attack

79%