survey 2025

SoK: Exposing the Generation and Detection Gaps in LLM-Generated Phishing Through Examination of Generation Methods, Content Characteristics, and Countermeasures

Fengchao Chen ^1,2, Tingmin Wu ², Van Nguyen ^1,2, Carsten Rudolph ¹

¹ Monash University

² CSIRO Data 61

0 citations

Published on arXiv

2508.21457

Output Integrity Attack

OWASP ML Top 10 — ML09

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

LLM-generated phishing achieves ~30% higher click-through rates than human-written phishing while exposing a structural offense-defense asymmetry where offensive techniques adapt dynamically but defenses remain static and reactive.

Phishing campaigns involve adversaries masquerading as trusted vendors trying to trigger user behavior that enables them to exfiltrate private data. While URLs are an important part of phishing campaigns, communicative elements like text and images are central in triggering the required user behavior. Further, due to advances in phishing detection, attackers react by scaling campaigns to larger numbers and diversifying and personalizing content. In addition to established mechanisms, such as template-based generation, large language models (LLMs) can be used for phishing content generation, enabling attacks to scale in minutes, challenging existing phishing detection paradigms through personalized content, stealthy explicit phishing keywords, and dynamic adaptation to diverse attack scenarios. Countering these dynamically changing attack campaigns requires a comprehensive understanding of the complex LLM-related threat landscape. Existing studies are fragmented and focus on specific areas. In this work, we provide the first holistic examination of LLM-generated phishing content. First, to trace the exploitation pathways of LLMs for phishing content generation, we adopt a modular taxonomy documenting nine stages by which adversaries breach LLM safety guardrails. We then characterize how LLM-generated phishing manifests as threats, revealing that it evades detectors while emphasizing human cognitive manipulation. Third, by taxonomizing defense techniques aligned with generation methods, we expose a critical asymmetry that offensive mechanisms adapt dynamically to attack scenarios, whereas defensive strategies remain static and reactive. Finally, based on a thorough analysis of the existing literature, we highlight insights and gaps and suggest a roadmap for understanding and countering LLM-driven phishing at scale.

Key Contributions

Modular taxonomy of nine stages by which adversaries systematically breach LLM safety guardrails to generate phishing content
Characterization of LLM-generated phishing content and how it evades detectors while targeting human cognitive manipulation
Taxonomy of defense techniques aligned with generation methods, exposing a critical asymmetry between dynamic offensive mechanisms and static/reactive defensive strategies

🛡️ Threat Analysis

Output Integrity Attack

The paper systematically analyzes how LLM-generated phishing content evades existing detectors, covering AI-generated content detection evasion and the asymmetry between dynamic offensive LLM output generation and static defensive detection strategies — this is output integrity and AI-generated content detection.

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Applications

phishing content generationphishing detectionemail security

Read PDF arXiv

SoK: Exposing the Generation and Detection Gaps in LLM-Generated Phishing Through Examination of Generation Methods, Content Characteristics, and Countermeasures

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy

Risk Assessment and Security Analysis of Large Language Models

ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected

From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching

Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

DoPE: Decoy Oriented Perturbation Encapsulation Human-Readable, AI-Hostile Documents for Academic Integrity

Security in LLM-as-a-Judge: A Comprehensive SoK