Latest papers

5 papers
attack arXiv Jan 6, 2026 · Jan 2026

Window-based Membership Inference Attacks Against Fine-tuned Large Language Models

Yuetian Chen, Yuntao Du, Kaiyuan Zhang et al. · Purdue University · Cisco Research +1 more

Sliding-window MIA against fine-tuned LLMs captures localized memorization signals, achieving 2-3x better detection than global-loss baselines

Membership Inference Attack nlp
PDF
defense arXiv Jan 5, 2026 · Jan 2026

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

Neusha Javidnia, Ruisi Zhang, Ashish Kundu et al. · University of California · Cisco Research

RL-trained LoRA adapters embed detectable watermarks in code LLM outputs, resisting refactoring and adversarial removal attacks

Output Integrity Attack nlp
PDF
attack arXiv Nov 9, 2025 · Nov 2025

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Haiyan Zhao, Zirui He, Fan Yang et al. · New Jersey Institute of Technology · Wake Forest University +1 more

Inverts LLM last-token representations to reconstruct original input text, recovering over half of 16-token sequence information

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
attack EMNLP Oct 27, 2025 · Oct 2025

Retracing the Past: LLMs Emit Training Data When They Get Lost

Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen et al. · Virginia Tech · Cisco Research

Extracts verbatim LLM training data by optimizing prompts to spike token entropy, achieving 22% extraction rate on Llama 2-70B

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Oct 22, 2025 · Oct 2025

Towards Strong Certified Defense with Universal Asymmetric Randomization

Hanbin Hong, Ashish Kundu, Ali Payani et al. · University of Connecticut · Cisco Research +1 more

Certified adversarial defense using anisotropic randomized smoothing that outperforms isotropic baselines by up to 182.6% on certified accuracy

Input Manipulation Attack vision
PDF Code