ML Security Papers

Latest papers

2 papers

defense arXiv Mar 16, 2026 · 21d ago

Zhuoshang Wang, Yubing Ren, Yanan Cao et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Black-box framework for third-party watermark detection in LLM outputs using proxy models and statistical tests

Output Integrity Attack nlp

attack TrustCom Nov 17, 2025 · Nov 2025

Siyang Cheng, Gaotian Liu, Rui Mei et al. · iFLYTEK · Anhui SparkShield Intelligent Technology +5 more

Evolutionary jailbreak framework using multi-level text perturbations and semantic fitness to bypass LLM alignment at high success rates

Prompt Injection nlp