Pin-Yu Chen

h-index: 3 837 citations 26 papers (total)

Papers in Database (3)

defense arXiv Nov 24, 2025 · Nov 2025

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Xurui Li, Kaisong Song, Rui Zhu et al. · Fudan University · Alibaba Group +3 more

Co-evolving attack-defense framework uses MCTS-based jailbreak exploration and curriculum RL to jointly train stronger LLM safety alignment

Prompt Injection nlp
2 citations PDF Code
attack arXiv Dec 1, 2025 · Dec 2025

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Rongzhe Wei, Peizhi Niu, Xinjie Shen et al. · Georgia Institute of Technology · University of Illinois Urbana-Champaign +4 more

Decomposes harmful requests into innocuous sub-queries via tree search to jailbreak commercial LLM guardrails at 95%+ success

Prompt Injection nlp
1 citations PDF Code
defense arXiv Jan 7, 2026 · 12w ago

RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection

Song-Duo Ma, Yi-Hung Liu, Hsin-Yu Lin et al. · National Taiwan University

Adversarially co-trains a retrieval-augmented fake-news detector against an LLM generator using natural-language critiques to improve robustness

Output Integrity Attack nlp
PDF