ML Security Papers

Latest papers

5 papers

defense arXiv Feb 14, 2026 · 7w ago

Weiming Song, Xuan Xie, Ruiping Yin · Beijing University of Technology · Macau University of Science and Technology

Defends LLMs against jailbreaks by extracting safety signals from attention heads and steering logits without fine-tuning

Prompt Injection nlp

defense arXiv Jan 1, 2026 · Jan 2026

Weijie Wang, Peizhuo Lv, Yan Wang et al. · Chinese Academy of Sciences · National University of Singapore +2 more

Injects false 'adulterant' facts into proprietary Knowledge Graphs to render stolen copies unusable in competing GraphRAG deployments

Model Theft nlpgraph

benchmark arXiv Dec 11, 2025 · Dec 2025

Zhuo Wang, Xiliang Liu, Ligang Sun · Beijing University of Technology

Benchmarks AI-generated video detectors' robustness to watermark removal and spoofing attacks across ten models and 6,500 videos

Output Integrity Attack vision

1 citations PDF

defense arXiv Sep 19, 2025 · Sep 2025

Laixin Zhang, Shuaibo Li, Wei Ma et al. · Beijing University of Technology · The Hong Kong University of Science and Technology (Guangzhou) +1 more

Novel Mixture-of-Experts framework for synthetic image detection using dual-routing across manifold and granularity expert subspaces

Output Integrity Attack visiongenerative

survey Journal of Network and Compute... Jan 1, 2025 · Jan 2025

Rui Meng, Song Gao, Dayu Fan et al. · Beijing University of Posts and Telecommunications · Peng Cheng Laboratory +1 more

Surveys ML security threats and defenses across AI-based semantic communication system lifecycle for 6G networks

Input Manipulation Attack Data Poisoning Attack Model Poisoning nlpvisionmultimodal

27 citations PDF