Latest papers

1 papers
attack arXiv Sep 26, 2025 · Sep 2025

Active Attacks: Red-teaming LLMs via Adaptive Environments

Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park et al. · KAIST · Mila – Québec AI Institute +3 more

RL-based red-teaming algorithm generates diverse LLM jailbreak prompts via adaptive victim fine-tuning, achieving 440x better coverage than GFlowNets

Prompt Injection nlp
1 citations PDF Code