Pin-Yu Chen

defense arXiv Nov 24, 2025 · Nov 2025

Xurui Li, Kaisong Song, Rui Zhu et al. · Fudan University · Alibaba Group +3 more

Co-evolving attack-defense framework uses MCTS-based jailbreak exploration and curriculum RL to jointly train stronger LLM safety alignment

Prompt Injection nlp

2 citations PDF Code

attack arXiv Dec 1, 2025 · Dec 2025

Rongzhe Wei, Peizhi Niu, Xinjie Shen et al. · Georgia Institute of Technology · University of Illinois Urbana-Champaign +4 more

Decomposes harmful requests into innocuous sub-queries via tree search to jailbreak commercial LLM guardrails at 95%+ success

Prompt Injection nlp

1 citations PDF Code

defense arXiv Jan 7, 2026 · 12w ago

Song-Duo Ma, Yi-Hung Liu, Hsin-Yu Lin et al. · National Taiwan University

Adversarially co-trains a retrieval-augmented fake-news detector against an LLM generator using natural-language critiques to improve robustness

Output Integrity Attack nlp

Papers in Database (3)