attack 2025

Anecdoctoring: Automated Red-Teaming Across Language and Place

2 citations · 1 influential · 62 references · EMNLP

Published on arXiv

2509.19143

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Anecdoctoring achieves higher attack success rates than few-shot prompting while offering interpretability through knowledge graph characterization of misinformation narratives.

Anecdoctoring

Novel technique introduced

Disinformation is among the top risks of generative artificial intelligence (AI) misuse. Global adoption of generative AI necessitates red-teaming evaluations (i.e., systematic adversarial probing) that are robust across diverse languages and cultures, but red-teaming datasets are commonly US- and English-centric. To address this gap, we propose "anecdoctoring", a novel red-teaming approach that automatically generates adversarial prompts across languages and cultures. We collect misinformation claims from fact-checking websites in three languages (English, Spanish, and Hindi) and two geographies (US and India). We then cluster individual claims into broader narratives and characterize the resulting clusters with knowledge graphs, with which we augment an attacker LLM. Our method produces higher attack success rates and offers interpretability benefits relative to few-shot prompting. Results underscore the need for disinformation mitigations that scale globally and are grounded in real-world adversarial misuse.

Key Contributions

Novel 'anecdoctoring' red-teaming methodology that clusters real-world misinformation claims into narratives, characterizes them with knowledge graphs, and uses graph-augmented LLM attackers to generate adversarial prompts
Multilingual, multi-geography red-teaming dataset spanning English, Spanish, and Hindi across US and Indian contexts
Demonstrates higher attack success rates and interpretability benefits over few-shot prompting baselines

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

Fact-checking website claims (English, Spanish, Hindi; US and India)

Applications

llm safety evaluationdisinformation red-teamingmultilingual ai safety

Read PDF arXiv DOI

Anecdoctoring: Automated Red-Teaming Across Language and Place

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space

PUZZLED: Jailbreaking LLMs through Word-Based Puzzles

Boundary Point Jailbreaking of Black-Box LLMs

HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models

When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation