Latest papers

8 papers
attack arXiv Mar 19, 2026 · 20d ago

Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents

Toan Tran, Olivera Kotevska, Li Xiong · Emory University · Oak Ridge National Laboratory

LLM-agent framework that automatically discovers novel membership inference attack strategies, achieving 0.18 AUC improvement over existing MIAs

Membership Inference Attack
PDF
attack arXiv Feb 2, 2026 · 9w ago

Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks

Bohan Wang, Zewen Liu, Lu Lin et al. · Emory University · The Pennsylvania State University +2 more

Adversarially decouples time series classifier predictions from explanations, enabling targeted misclassification with plausible-looking cover-up explanations

Input Manipulation Attack timeseries
PDF
attack arXiv Dec 29, 2025 · Dec 2025

Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation

Kaustubh Dhole · Emory University

Exploits LLM intermediate attention layers to generate adversarial token substitutions that measurably degrade LLM evaluator accuracy on argument quality tasks

Input Manipulation Attack nlp
PDF
tool arXiv Dec 21, 2025 · Dec 2025

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

Zhang Wei, Peilu Hu, Zhenyuan Wei et al. · Independent Researcher · Ltd. +12 more

Automated red-teaming tool for LLMs using meta-prompt-guided adversarial generation, finding 3.9× more vulnerabilities than manual testing

Prompt Injection nlp
1 citations PDF
tool arXiv Nov 17, 2025 · Nov 2025

Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

Yuyang Xia, Ruixuan Liu, Li Xiong · Emory University

Membership inference-based auditing framework that empirically verifies differential privacy guarantees for LLM in-context learning prompt demonstrations

Membership Inference Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Sep 29, 2025 · Sep 2025

MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification

Xiaoyi Huang, Junwei Wu, Kejia Zhang et al. · Xiamen University · Emory University

Frequency-adaptive diffusion purification defense targeting high-frequency adversarial noise, achieving SOTA robust accuracy on RobustBench

Input Manipulation Attack vision
PDF
tool arXiv Aug 18, 2025 · Aug 2025

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang, Min Gao, Zongwei Wang et al. · Chongqing University · Emory University +1 more

Detects LLM-generated fake news by extracting prompt-induced linguistic fingerprints from reconstructed word-level probability distributions

Output Integrity Attack nlp
PDF Code
defense arXiv Aug 5, 2025 · Aug 2025

Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation

Haoran Wang, Xiongxiao Xu, Baixiang Huang et al. · Emory University · Illinois Institute of Technology

Defends RAG systems against private data extraction by injecting calibrated noise into token logits with formal DP guarantees

Sensitive Information Disclosure nlp
PDF Code