Latest papers

2 papers
defense arXiv Mar 16, 2026 · 21d ago

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

Zhuoshang Wang, Yubing Ren, Yanan Cao et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Black-box framework for third-party watermark detection in LLM outputs using proxy models and statistical tests

Output Integrity Attack nlp
PDF
attack TrustCom Nov 17, 2025 · Nov 2025

ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models

Siyang Cheng, Gaotian Liu, Rui Mei et al. · iFLYTEK · Anhui SparkShield Intelligent Technology +5 more

Evolutionary jailbreak framework using multi-level text perturbations and semantic fitness to bypass LLM alignment at high success rates

Prompt Injection nlp
PDF