Latest papers

1 papers
defense arXiv Jan 29, 2026 · 9w ago

Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

Mingyang Liao, Yichen Wan, shuchen wu et al. · Baidu Inc. · The University of Queensland +1 more

Training-free dual-cycle framework defends LLM role-playing agents against jailbreaks while preserving persona fidelity via evolving hierarchical knowledge

Prompt Injection nlp
PDF Code