defense 2026

UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

Tianlong Yu ¹, Yang Yang ¹, Xiao Luo ¹, Lihong Liu ¹, Fudu Xing ², Zui Tao ³, Kailong Wang ³, Gaoyang Liu ³, Ting Bi ³

¹ Hubei University

² University of Southern California

³ Huazhong University of Science and Technology

0 citations

Published on arXiv

2604.23141

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Evaluated in IRB-approved user study with 60 participants demonstrating effective defense against AR-LLM social engineering attacks

UNSEEN

Novel technique introduced

Emerging AR-LLM-based Social Engineering attack (e.g., SEAR) is at the edge of posing great threats to real-world social life. In such AR-LLM-SE attack, the attacker can leverage AR (Augmented Reality) glass to capture the image and vocal information of the target, using the LLM to identify the target and generate the social profile, using the LLM agents to apply social engineering strategies for conversation suggestion to win the target trust and perform phishing afterwards. Current defensive approaches, such as role-based access control or data flow tracking, are not directly applicable to the convergent AR-LLM ecosystem (considering embedded AR device and opaque LLM inference), leaving an emerging and potent social engineering threat that existing privacy paradigms are ill-equipped to address. This necessitates a shift beyond solely human-centric measures like legislation and user education toward enforceable vendor policies and platform-level restrictions. Realizing this vision, however, faces significant technical challenges: securing resource-constrained AR-embedded devices, implementing fine-grained access control within opaque LLM inferences, and governing adaptive interactive agents. To address these challenges, we present UNSEEN, a coordinated cross-stack defense that combines an AR ACL (Access Control Layer) for identity-gated sensing, F-RMU-based LLM unlearning for sensitive profile suppression, and runtime agent guardrails for adaptive interaction control. We evaluate UNSEEN in an IRB-approved user study with 60 participants and a dataset of 360 annotated conversations across realistic social scenarios.

Key Contributions

Cross-stack defense combining AR access control, LLM unlearning for identity suppression, and agent guardrails
F-RMU-based unlearning method to suppress sensitive profile information in LLM responses
Runtime agent guardrails for adaptive interaction control in social engineering scenarios

🛡️ Threat Analysis

Prompt Injection

Defends against social engineering attacks via LLM agents that manipulate conversation to win trust and perform phishing - this is LLM-based manipulation of user interaction through adaptive conversational strategies.

Excessive Agency

Addresses security of LLM agents performing adaptive social engineering interactions - proposes runtime agent guardrails to control excessive agency and adaptive interaction control.

Details

Domains

multimodalnlp

Model Types

llmmultimodal

Threat Tags

inference_timeblack_box

Datasets

360 annotated conversations across realistic social scenarios

Applications

social engineering defensear-llm systemsprivacy protection

Read PDF arXiv

UNSEEN: A Cross-Stack LLM Unlearning Defense against AR-LLM Social Engineering Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Is Monitoring Enough? Strategic Agent Selection For Stealthy Attack in Multi-Agent Discussions

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels