Shrey Shah

Papers in Database (1)

benchmark arXiv Feb 28, 2026 · 5w ago

The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents

Shrey Shah, Levent Ozgur · Microsoft

Benchmark revealing frontier LLMs catastrophically fail when a single misinformation article tops web search results, despite access to truthful sources

Input Manipulation Attack Prompt Injection nlp
PDF