benchmark 2026

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan 1, Zhenhao Zhang 2, Haoming Tang 2, Siying Hu 2

0 citations · 34 references · arXiv

α

Published on arXiv

2601.03134

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Multi-turn scam interactions follow recurrent escalation patterns; interactional failures most often stem from safety guardrail activation and role instability, with Claude 4.5 showing the strongest combined attack and defense performance.

ScamBot/VictimBot LLM-to-LLM simulation framework

Novel technique introduced


As LLMs gain persuasive agentic capabilities through extended dialogues, they introduce novel risks in multi-turn conversational scams that single-turn safety evaluations fail to capture. We systematically study these risks using a controlled LLM-to-LLM simulation framework across multi-turn scam scenarios. Evaluating eight state-of-the-art models in English and Chinese, we analyze dialogue outcomes and qualitatively annotate attacker strategies, defensive responses, and failure modes. Results reveal that scam interactions follow recurrent escalation patterns, while defenses employ verification and delay mechanisms. Furthermore, interactional failures frequently stem from safety guardrail activation and role instability. Our findings highlight multi-turn interactional safety as a critical, distinct dimension of LLM behavior.


Key Contributions

  • LLM-to-LLM simulation framework (ScamBot vs VictimBot) for controlled, reproducible multi-turn scam evaluation across 10 fictional fraud categories
  • BERTopic-based latent strategy space analysis of 18,648 dialogues revealing recurrent attacker escalation patterns and defender verification/delay mechanisms
  • Cross-lingual (English and Chinese) safety evaluation of 8 state-of-the-art LLMs with a four-outcome label space (SUCCESS, DETECTED, NO_RESOLUTION, ERROR)

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timeblack_box
Datasets
18,648 LLM-to-LLM scam dialogue dataset (authors' own)
Applications
conversational ai safetysocial engineering detectionllm red-teaming