ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Published on arXiv
2508.06457
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Current LLM safety guardrails — including those in models with strong prompt-level safeguards — fail to prevent scam script generation when harmful goals are decomposed and delivered incrementally through an autonomous agent framework.
ScamAgent
Novel technique introduced
Large Language Models (LLMs) have demonstrated impressive fluency and reasoning capabilities, but their potential for misuse has raised growing concern. In this paper, we present ScamAgent, an autonomous multi-turn agent built on top of LLMs, capable of generating highly realistic scam call scripts that simulate real-world fraud scenarios. Unlike prior work focused on single-shot prompt misuse, ScamAgent maintains dialogue memory, adapts dynamically to simulated user responses, and employs deceptive persuasion strategies across conversational turns. We show that current LLM safety guardrails, including refusal mechanisms and content filters, are ineffective against such agent-based threats. Even models with strong prompt-level safeguards can be bypassed when prompts are decomposed, disguised, or delivered incrementally within an agent framework. We further demonstrate the transformation of scam scripts into lifelike voice calls using modern text-to-speech systems, completing a fully automated scam pipeline. Our findings highlight an urgent need for multi-turn safety auditing, agent-level control frameworks, and new methods to detect and disrupt conversational deception powered by generative AI.
Key Contributions
- ScamAgent framework integrating LLMs with memory, goal decomposition, and TTS to simulate fully automated multi-turn scam calls without human input
- Empirical demonstration that LLM safety guardrails (refusal mechanisms, content filters) are bypassed at high rates when harmful tasks are decomposed, roleplay-framed, or delivered incrementally across conversational turns
- Multi-layered defense proposal including multi-turn moderation, persona restrictions, memory control, and intent detection for agentic LLM misuse