Sahil Wadhwa

Papers in Database (1)

attack arXiv Apr 22, 2026 · 4w ago

Adaptive Instruction Composition for Automated LLM Red-Teaming

Jesse Zymet, Andy Luo, Swapnil Shinde et al. · Capital One

RL-based red-teaming framework that adaptively composes crowdsourced jailbreak tactics to discover diverse, effective attacks against LLMs

Prompt Injection nlp
PDF