UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks
Tianlong Yu 1, Yang Yang 1, Xiao Luo 1, Lihong Liu 1, Fudu Xing 2, Zui Tao 3, Kailong Wang 3, Gaoyang Liu 3, Ting Bi 3
Published on arXiv
2604.23141
Prompt Injection
OWASP LLM Top 10 — LLM01
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Evaluated in IRB-approved user study with 60 participants demonstrating effective defense against AR-LLM social engineering attacks
UNSEEN
Novel technique introduced
Emerging AR-LLM-based Social Engineering attack (e.g., SEAR) is at the edge of posing great threats to real-world social life. In such AR-LLM-SE attack, the attacker can leverage AR (Augmented Reality) glass to capture the image and vocal information of the target, using the LLM to identify the target and generate the social profile, using the LLM agents to apply social engineering strategies for conversation suggestion to win the target trust and perform phishing afterwards. Current defensive approaches, such as role-based access control or data flow tracking, are not directly applicable to the convergent AR-LLM ecosystem (considering embedded AR device and opaque LLM inference), leaving an emerging and potent social engineering threat that existing privacy paradigms are ill-equipped to address. This necessitates a shift beyond solely human-centric measures like legislation and user education toward enforceable vendor policies and platform-level restrictions. Realizing this vision, however, faces significant technical challenges: securing resource-constrained AR-embedded devices, implementing fine-grained access control within opaque LLM inferences, and governing adaptive interactive agents. To address these challenges, we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.
Key Contributions
- Cross-stack defense combining AR access control, LLM unlearning for identity suppression, and agent guardrails
- F-RMU-based unlearning method to suppress sensitive profile information in LLM responses
- Runtime agent guardrails for adaptive interaction control in social engineering scenarios
🛡️ Threat Analysis
Defends against social engineering attacks via LLM agents that manipulate conversation to win trust and perform phishing - this is LLM-based manipulation of user interaction through adaptive conversational strategies.
Addresses security of LLM agents performing adaptive social engineering interactions - proposes runtime agent guardrails to control excessive agency and adaptive interaction control.