Latest papers

6 papers
defense arXiv Apr 2, 2026 · 4d ago

Combating Data Laundering in LLM Training

Muxing Li, Zesheng Ye, Sharon Li et al. · University of Melbourne · University of Wisconsin-Madison

Detects unauthorized LLM training data use even when original data has been laundered through style transformations

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Sep 15, 2025 · Sep 2025

Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition

Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng et al. · University of Wisconsin-Madison · University of Warwick

Defends against adversarial misuse of open-weight model predictions by inducing maximal output uncertainty on protected personal instances

Output Integrity Attack vision
PDF
benchmark arXiv Sep 10, 2025 · Sep 2025

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models

Maheep Chaudhary, Ian Su, Nikhil Hooda et al. · Independent · University of California +6 more

Discovers power-law scaling of LLM evaluation awareness across 15 models, forecasting deceptive capability concealment in larger models

Prompt Injection nlp
PDF Code
survey arXiv Sep 4, 2025 · Sep 2025

Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs

Brennen Hill, Surendra Parla, Venkata Abhijeeth Balabhadruni et al. · University of Wisconsin-Madison

Surveys and categorizes prompt-based LLM attack methodologies — injection, jailbreaking, adversarial prompting — to establish a structured threat model

Prompt Injection nlp
PDF
survey arXiv Aug 27, 2025 · Aug 2025

Intellectual Property in Graph-Based Machine Learning as a Service: Attacks and Defenses

Lincan Li, Bolin Shen, Chenxi Zhao et al. · Florida State University · Northeastern University +3 more

Survey of model theft, data reconstruction, and membership inference attacks and defenses for graph ML-as-a-service, with open-source evaluation library PyGIP

Model Theft Model Inversion Attack Membership Inference Attack graph
PDF Code
defense arXiv Aug 26, 2025 · Aug 2025

PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality

Nanxi Li, Zhengyue Zhao, Chaowei Xiao · University of Wisconsin-Madison

Defends VLMs against multimodal jailbreaks using safety-aware chain-of-thought reasoning trained via MCTS and DPO

Prompt Injection multimodalvisionnlp
PDF Code