defense 2026

Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models

Zhaopeng Zhang ¹, Pengcheng Sun ¹, Lan Zhang ¹, Chen Tang ¹, Jiewei Lai ², Yunhao Wang ², Hui Jin ¹

¹ University of Science and Technology of China

² Lenovo

0 citations · 23 references · arXiv

Published on arXiv

2601.13630

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

AAAC reduces permission violation rates by up to 86.5% and prompt-based attack success rates by 90.7% across three LLM families with minor inference overhead.

AAAC (Activation-space Anchored Access Control)

Novel technique introduced

Large language models (LLMs) are increasingly deployed over knowledge bases for efficient knowledge retrieval and question answering. However, LLMs can inadvertently answer beyond a user's permission scope, leaking sensitive content, thus making it difficult to deploy knowledge-base QA under fine-grained access control requirements. In this work, we identify a geometric regularity in intermediate activations: for the same query, representations induced by different permission scopes cluster distinctly and are readily separable. Building on this separability, we propose Activation-space Anchored Access Control (AAAC), a training-free framework for multi-class permission control. AAAC constructs an anchor bank, with one permission anchor per class, from a small offline sample set and requires no fine-tuning. At inference time, a multi-anchor steering mechanism redirects each query's activations toward the anchor-defined authorized region associated with the current user, thereby suppressing over-privileged generations by design. Finally, extensive experiments across three LLM families demonstrate that AAAC reduces permission violation rates by up to 86.5% and prompt-based attack success rates by 90.7%, while improving response usability with minor inference overhead compared to baselines.

Key Contributions

Identifies that different permission scopes induce geometrically separable clusters in LLM intermediate activations, enabling activation-space classification of permission scope
Proposes AAAC, a training-free multi-anchor activation steering framework that redirects queries toward the authorized permission region at inference time without fine-tuning
Introduces MultiPER-Enterprise, a department-scoped enterprise QA benchmark for evaluating fine-grained access control in knowledge-base LLM deployments

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timeblack_box

Datasets

MultiPER-Enterprise

Applications

enterprise knowledge base qallm-based knowledge retrievalrag systems with access control

Read PDF arXiv DOI

Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher

NeuroFilter: Privacy Guardrails for Conversational LLM Agents

SAFENLIDB: A Privacy-Preserving Safety Alignment Framework for LLM-based Natural Language Database Interfaces

Adaptive Backtracking for Privacy Protection in Large Language Models

DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization

Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models

LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks