defense 2025

Enterprise AI Must Enforce Participant-Aware Access Control

Shashank Shreedhar Bhatt , Tanmay Rajore , Khushboo Aggarwal , Ganesh Ananthanarayanan , Ranveer Chandra , Nishanth Chandran , Suyash Choudhury , Divya Gupta , Emre Kiciman , Sumit Kumar Pandey , Srinath Setty , Rahul Sharma , Teijia Zhao

Microsoft

0 citations

Published on arXiv

2509.14608

Model Inversion Attack

OWASP ML Top 10 — ML03

Sensitive Information Disclosure

OWASP LLM Top 10 — LLM06

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Existing probabilistic defenses fundamentally fail against data exfiltration in enterprise LLMs; only deterministic access control enforced at every stage of fine-tuning and RAG inference can provably prevent sensitive data leakage to unauthorized users.

Participant-Aware Access Control

Novel technique introduced

Large language models (LLMs) are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.

Key Contributions

Demonstrates that fine-tuned enterprise LLMs are vulnerable to training data extraction attacks and that existing probabilistic defenses (sanitization, output filtering, DP) fail to provide robust protection
Demonstrates indirect cross-prompt injection attacks (XPIA) on RAG pipelines that succeed even under human review, exfiltrating sensitive documents via steganographic encoding in model responses
Proposes a deterministic participant-aware access control framework — enforcing that any content used in training or retrieval is authorized for all users involved — deployed in Microsoft Copilot Tuning

🛡️ Threat Analysis

Model Inversion Attack

Paper explicitly demonstrates attacks where adversaries extract sensitive training data from fine-tuned enterprise LLMs, showing existing probabilistic defenses (differential privacy, output filtering) fail to prevent reconstruction of confidential training content.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxtraining_timeinference_timetargeted

Applications

enterprise ai assistantsrag pipelinesfine-tuned llm systemsmicrosoft copilot

Read PDF arXiv

Enterprise AI Must Enforce Participant-Aware Access Control

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Extracting books from production language models

Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning

CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs

CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage

Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs

Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models

Security Analysis of ChatGPT: Threats and Privacy Risks

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs