benchmark 2026

The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Shrey Shah , Levent Ozgur

0 citations

α

Published on arXiv

2603.00801

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

A single misinformation article at rank 0 collapses GPT-5 accuracy from 65.1% to 18.2%, with similar catastrophic drops across o3, o1, and 4o models despite unlimited access to truthful sources.

Synthetic Web Benchmark

Novel technique introduced


Language agents increasingly act as web-enabled systems that search, browse, and synthesize information from diverse sources. However, these sources can include unreliable or adversarial content, and the robustness of agents to adversarial ranking - where misleading information appears prominently in search results - remains poorly understood. Existing benchmarks evaluate functional navigation or static factuality but cannot causally isolate this vulnerability, and current mitigation strategies for retrieval-augmented generation remain largely untested under such conditions. We introduce Synthetic Web Benchmark, a procedurally generated environment comprising thousands of hyperlinked articles with ground-truth labels for credibility and factuality, process-level interaction traces, and contamination filtering to eliminate training-data leakage. By injecting a single high-plausibility misinformation article into a controllable search rank, we measure the causal effect of adversarial exposure in six frontier models. The results reveal catastrophic failures: accuracy collapses despite unlimited access to truthful sources, with minimal search escalation and severe miscalibration. These findings expose fundamental limitations in how current frontier models handle conflicting information, with immediate implications for deployment in high-stakes domains. Our benchmark enables systematic analysis of these failure modes and provides a controlled testbed for evaluating mitigation strategies under adversarial ranking - a gap in current research. This work establishes a reproducible baseline for developing search-robust and epistemically humble agents capable of resisting manipulation in high-stakes domains.


Key Contributions

  • Synthetic Web Benchmark: a procedurally generated environment of thousands of hyperlinked articles with ground-truth credibility/factuality labels, contamination filtering to eliminate training-data leakage, and process-level interaction traces
  • Rank-controlled adversarial exposure protocol that causally measures the effect of injecting a single high-plausibility misinformation article at rank 0 across six frontier models
  • Diagnosis of three systematic failure modes — minimal search escalation, poor multi-source synthesis, and severe miscalibration — exposing fundamental epistemic weaknesses in current web-enabled language agents

🛡️ Threat Analysis

Input Manipulation Attack

The adversarial ranking attack injects a strategically crafted high-plausibility misinformation article into a search-augmented LLM system — a form of adversarial SEO poisoning / document injection for RAG explicitly listed under ML01's dual-tagging criteria for adversarial content manipulation of LLM-integrated systems.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Datasets
Synthetic Web Benchmark (authors' own procedurally generated corpus)
Applications
web-enabled language agentsretrieval-augmented generationsearch-based question answering