Latest papers

7 papers
defense arXiv Feb 23, 2026 · 6w ago

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong et al. · Florida State University · University of Wisconsin +2 more

Certifies DNN ownership against model extraction using mutual information similarity with theoretical verification guarantees

Model Theft visionnlp
PDF Code
attack arXiv Feb 14, 2026 · 7w ago

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun et al. · University of North Carolina at Chapel Hill · Carnegie Mellon University +2 more

Attacks LLM alignment pipelines by crafting benchmark-compliant rubric edits that systematically bias judge preferences and corrupt RLHF training

Transfer Learning Attack Prompt Injection nlp
PDF Code
benchmark arXiv Nov 26, 2025 · Nov 2025

The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning

Ethan Hsu, Harry Chen, Chudi Zhong et al. · Duke University · MIT +2 more

Analyzes how Rashomon set diversity improves adversarial robustness but increases training data leakage via a proven robustness-privacy trade-off

Input Manipulation Attack Model Inversion Attack tabular
PDF
defense arXiv Nov 22, 2025 · Nov 2025

Vulnerability-Aware Robust Multimodal Adversarial Training

Junrui Zhang, Xinyu Zhao, Jie Peng et al. · University of Science & Technology of China · University of North Carolina at Chapel Hill +1 more

Adversarial training defense that quantifies per-modality vulnerability to selectively harden multimodal models against adversarial attacks

Input Manipulation Attack multimodal
PDF Code
survey arXiv Oct 21, 2025 · Oct 2025

The Black Tuesday Attack: how to crash the stock market with adversarial examples to financial forecasting models

Thomas Hofweber, Jefrey Bergl, Ian Reyes et al. · University of North Carolina at Chapel Hill · Oak Ridge National Laboratory

Analyzes how adversarial input manipulations to ML financial forecasting models could trigger self-fulfilling, hard-to-detect stock market crashes

Input Manipulation Attack timeseries
PDF
attack arXiv Oct 12, 2025 · Oct 2025

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

Mohan Zhang, Yihua Zhang, Jinghan Jia et al. · University of North Carolina at Chapel Hill · Michigan State University +1 more

Backdoor-implanted attack on large reasoning models forcing perpetual CoT loops, achieving 100% resource exhaustion success rate

Model Poisoning Model Denial of Service nlp
1 citations PDF
defense arXiv Jan 1, 2025 · Jan 2025

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan et al. · Imperial College London · Peking University +2 more

Defends RAG systems against corpus poisoning via two-stage cluster filtering and LLM self-assessment to block malicious retrieved documents

Data Poisoning Attack Prompt Injection nlp
10 citations PDF