Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human validation study with 108 senior volunteers, AI-generated phishing emails successfully compromised 11\% of participants. Our work uniquely demonstrates the complete attack pipeline targeting elderly populations, highlighting that current AI safety measures fail to protect those most vulnerable to fraud. Beyond generating phishing content, LLMs enable attackers to overcome language barriers and conduct multi-turn trust-building conversations at scale, fundamentally transforming fraud economics. While some providers report voluntary counter-abuse efforts, we argue these remain insufficient.

Key Contributions

Systematic evaluation of 6 frontier LLMs across 40 prompts in 4 jailbreak categories targeting senior-focused phishing content, revealing critical safety guardrail failures
Human validation study (n=108 elderly volunteers) demonstrating 11% compromise rate from AI-generated phishing emails, empirically linking LLM safety failures to real-world harm
End-to-end demonstration of the complete attack pipeline — from jailbreaking to multi-turn trust-building at scale — arguing current voluntary counter-abuse measures are insufficient

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

Custom 40-prompt jailbreak evaluation setHuman validation study with 108 senior volunteers

Applications

llm safety guardrailsphishing content generationelder fraud prevention

2025 0 cit.

100%