ML Security Papers

Latest papers

5 papers

attack arXiv Apr 10, 2026 · 5w ago

Weiyang Guo, Zesheng Shi, Zeen Zhu et al. · Harbin Institute of Technology · Huawei Technologies

Backdoor attack on RLVR-trained LLMs that implants jailbreak triggers using 2% poisoned data, degrading safety by 73%

Model Poisoning Transfer Learning Attack Prompt Injection nlpreinforcement-learning

attack arXiv Feb 13, 2026 · Feb 2026

Dong Han, Yong Li, Joachim Denzler · Huawei Technologies · Friedrich Schiller University Jena

Attacks privacy-preserving face recognition systems by inverting facial embeddings into realistic face images using KAN and diffusion models

Model Inversion Attack vision

benchmark arXiv Jan 7, 2026 · Jan 2026

Xing Li, Hui-Ling Zhen, Lihao Yin et al. · Huawei Technologies

Large-scale safety alignment benchmark evaluating 32 LLMs with 56 jailbreak techniques, finding CoT prefix attacks raise ASR by 3.34x

Prompt Injection nlp

attack arXiv Dec 6, 2025 · Dec 2025

Chenyu Zhang, Yiwen Ma, Lanjun Wang et al. · Tianjin University · Huawei Technologies

Metaphor-based jailbreak attack bypasses T2I model safety filters without knowing deployed defense type using LLM multi-agent prompt generation

Prompt Injection visionnlpmultimodalgenerative

1 citations PDF Code

defense arXiv Nov 11, 2025 · Nov 2025

Yaxin Xiao, Qingqing Ye, Zi Liang et al. · The Hong Kong Polytechnic University · Huawei Technologies +1 more

Proposes WRK to break existing black-box model watermarks, then introduces CFW watermarking resilient to combined extraction and removal attacks

Model Theft vision