ML Security Papers

Latest papers

4 papers

defense arXiv Feb 2, 2026 · 9w ago

Alessandro De Palma · London School of Economics and Political Science · INRIA

Distills adversarially-trained teachers into certifiably-robust student models to improve certified robustness-accuracy trade-offs for ReLU networks

Input Manipulation Attack vision

defense arXiv Jan 29, 2026 · 9w ago

Hongyi Zhou, Jin Zhu, Kai Ye et al. · Tsinghua University · University of Birmingham +1 more

Adaptive distance-learning algorithm for detecting LLM-generated text outperforms baselines by up to 80.6% across GPT, Claude, and Gemini

Output Integrity Attack nlp

2 citations PDF

tool arXiv Jan 10, 2026 · 12w ago

Hongyi Zhou, Jin Zhu, Ying Yang et al. · Tsinghua University · University of Birmingham +1 more

Proposes LLM-generated text detector with statistical inference guarantees, outperforming watermark-free and ML-based baselines

Output Integrity Attack nlp

3 citations PDF Code

defense arXiv Sep 29, 2025 · Sep 2025

Hongyi Zhou, Jin Zhu, Pingfan Su et al. · Tsinghua University · London School of Economics and Political Science +2 more

Adaptive LLM-text detector learns a witness function from training data, improving state-of-the-art AUC by up to 37%

Output Integrity Attack nlp

5 citations 1 influentialPDF Code