ML Security Papers

Latest papers

5 papers

defense arXiv Feb 12, 2026 · 7w ago

Community Concealment from Unsupervised Graph Learning-Based Clustering

Dalyapraz Manatova, Pablo Moriano, L. Jean Camp · Indiana University · Oak Ridge National Laboratory +1 more

Evades GNN community detection by perturbing graph edges and node features to conceal sensitive communities from unsupervised clustering

Input Manipulation Attack graph

PDF

benchmark arXiv Jan 7, 2026 · 12w ago

Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Binh Nguyen, Thai Le · Indiana University · Independent Researcher

Benchmarks reasoning robustness of audio deepfake detectors under adversarial attack, revealing a shield-vs-tax bifurcation based on acoustic perception quality

Input Manipulation Attack Output Integrity Attack audionlp

1 citations PDF

defense arXiv Nov 24, 2025 · Nov 2025

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Xurui Li, Kaisong Song, Rui Zhu et al. · Fudan University · Alibaba Group +3 more

Co-evolving attack-defense framework uses MCTS-based jailbreak exploration and curriculum RL to jointly train stronger LLM safety alignment

Prompt Injection nlp

2 citations PDF Code

benchmark arXiv Oct 29, 2025 · Oct 2025

The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework

Aakriti Shah, Thai Le · University of Southern California · Indiana University

Persuasive authority framing recovers supposedly-unlearned factual knowledge from LLMs, exposing critical gaps in unlearning completeness

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

benchmark arXiv Jan 3, 2025 · Jan 2025

Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Christopher Burger, Charles Walter, Thai Le et al. · University of Mississippi · Indiana University +1 more

Proposes better similarity metrics for measuring adversarial robustness of NLP explanation methods like LIME using synonymity weighting

Input Manipulation Attack nlp

1 citations PDF

Latest papers

Community Concealment from Unsupervised Graph Learning-Based Clustering

Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework

Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue