Latest papers

211 papers
defense arXiv Apr 2, 2026 · 4d ago

Combating Data Laundering in LLM Training

Muxing Li, Zesheng Ye, Sharon Li et al. · University of Melbourne · University of Wisconsin-Madison

Detects unauthorized LLM training data use even when original data has been laundered through style transformations

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
benchmark arXiv Apr 1, 2026 · 5d ago

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi · Keysight Technologies

Automated framework evaluating LLM system instruction leakage via encoding attacks, achieving 70%+ success rates with structured formats

Sensitive Information Disclosure Prompt Injection nlp
PDF Code
attack arXiv Mar 26, 2026 · 11d ago

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri · Ben-Gurion University of the Negev

Side-channel attack extracting image geometry and semantic content from local VLMs via timing and cache contention analysis

Output Integrity Attack Sensitive Information Disclosure multimodalvision
PDF
survey arXiv Mar 25, 2026 · 12d ago

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan · University of Central Florida · University of Copenhagen

Unified taxonomy of ML security threats organizing attacks into data-to-data, data-to-model, model-to-data, and model-to-model categories

Input Manipulation Attack Data Poisoning Attack Model Inversion Attack Membership Inference Attack Model Theft Output Integrity Attack Model Poisoning Prompt Injection Sensitive Information Disclosure visionnlpmultimodal
PDF
defense arXiv Mar 25, 2026 · 12d ago

DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

Hongyi Miao, Jun Jia, Xincheng Wang et al. · Shandong University · Shanghai Jiao Tong University +4 more

Data poisoning defense that protects private photo datasets from VLM fine-tuning attacks that extract identity-affiliation relationships

Data Poisoning Attack Sensitive Information Disclosure visionnlpmultimodal
PDF
defense arXiv Mar 24, 2026 · 13d ago

Chain-of-Authorization: Internalizing Authorization into Large Language Models via Reasoning Trajectories

Yang Li, Yule Liu, Xinlei He et al. · Tsinghua University · The Hong Kong University of Science and Technology +1 more

Fine-tunes LLMs to generate explicit authorization reasoning chains before responses, defending against unauthorized access and prompt injection

Prompt Injection Sensitive Information Disclosure nlp
PDF
survey arXiv Mar 23, 2026 · 14d ago

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu, Feiyang Li et al. · State Key Laboratory of Mathematical Engineering and Advanced Computing · Information Engineering University +2 more

First end-to-end survey mapping RAG security threats, defenses, and benchmarks across the entire pipeline

Prompt Injection Training Data Poisoning Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 18, 2026 · 19d ago

SEAL-Tag: Self-Tag Evidence Aggregation with Probabilistic Circuits for PII-Safe Retrieval-Augmented Generation

Jin Xie, Songze Li, Guang Cheng

Defense framework preventing PII extraction from RAG systems via structured evidence tables and probabilistic circuits for policy enforcement

Sensitive Information Disclosure Prompt Injection nlp
PDF
attack arXiv Mar 17, 2026 · 20d ago

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

Yibo Li, Qiongxiu Li · Politecnico di Milano · Aalborg University

Scalable gradient inversion attack recovering private training text from aggregated LLM gradients in federated learning settings

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF
defense arXiv Mar 13, 2026 · 24d ago

Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

Xiangkui Cao, Jie Zhang, Meina Kan et al. · Institute of Computing Technology · University of Chinese Academy of Sciences

Neuron-level model editing technique that teaches vision-language models to refuse privacy-invasive queries while preserving utility

Sensitive Information Disclosure Prompt Injection multimodalnlpvision
PDF
attack arXiv Mar 12, 2026 · 25d ago

Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems

Sarbartha Banerjee, Prateek Sahu, Anjo Vahldiek-Oberwagner et al. · Georgia Tech · The University of Texas at Austin +3 more

Compounds Rowhammer hardware faults and RAG database injection with LLM attacks to jailbreak guardrails and exfiltrate user data

Prompt Injection Sensitive Information Disclosure nlp
PDF
benchmark arXiv Mar 11, 2026 · 26d ago

The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan et al. · Georgia Institute of Technology · Stanford University +1 more

Exposes LLM unlearning brittleness by showing multi-hop and alias queries recover supposedly forgotten information missed by static benchmarks

Sensitive Information Disclosure nlp
PDF Code
defense arXiv Mar 11, 2026 · 26d ago

CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

Panagiotis Georgios Pennas, Konstantinos Papaioannou, Marco Guarnieri et al. · IMDEA Software Institute · Universidad Politécnica de Madrid

Defends multi-tenant LLM servers against KV cache timing side channels that let attackers reconstruct other users' prompts

Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 11, 2026 · 26d ago

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu et al. · OpenAI

Proposes a reinforcement learning dataset that trains LLMs to resist jailbreaks, prompt injection, and system prompt extraction via instruction hierarchy

Prompt Injection Sensitive Information Disclosure nlp
PDF Code
defense arXiv Mar 10, 2026 · 27d ago

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

Yinpeng Wu, Yitong Chen, Lixiang Wang et al. · Shanghai Jiao Tong University

TEE-based LLM serving system that protects model weights and user data from compromised OS kernels on mobile devices

Model Theft Sensitive Information Disclosure nlp
PDF
attack arXiv Mar 10, 2026 · 27d ago

CLIOPATRA: Extracting Private Information from LLM Insights

Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz · arXiv · University College London +1 more

Attacks Anthropic's Clio LLM analytics platform by injecting crafted chats to extract private medical history of target users, bypassing layered privacy protections

Sensitive Information Disclosure Prompt Injection nlp
PDF Code
defense arXiv Mar 9, 2026 · 28d ago

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng et al. · Nanyang Technological University · A*STAR +3 more

Defends against speaker re-identification attacks on LLM speech dialogue models using streaming voice anonymization

Sensitive Information Disclosure audionlp
PDF
benchmark arXiv Mar 9, 2026 · 28d ago

The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Sebastian Ochs, Ivan Habernal · Trustworthy Human Language Technologies · Technical University of Darmstadt +2 more

Critiques PII reconstruction attack evaluations, showing data leakage and LLM memorization inflate reported attack success rates

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Mar 9, 2026 · 28d ago

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

Jianshu She · MBZUAI

Defends enterprise LLM agents against data leakage by splitting sensitive handling from cloud reasoning with context-aware sanitization

Sensitive Information Disclosure Insecure Plugin Design nlp
PDF
defense arXiv Mar 5, 2026 · 4w ago

Good-Enough LLM Obfuscation (GELO)

Anatoly Belikov, Ilya Fedotov · SingularityNET Foundation · Singularity Compute

Defends LLM prompt privacy on shared accelerators by obfuscating hidden states with per-batch invertible mixing inside a TEE

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
Loading more papers…