Latest papers

1 papers
benchmark arXiv Nov 16, 2025 · Nov 2025

Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

Samuel Nathanson, Rebecca Williams, Cynthia Matuszek · Johns Hopkins University Applied Physics Laboratory · University of Maryland

Empirically quantifies how attacker-to-target size ratio predicts jailbreak success across 6,000+ multi-LLM adversarial exchanges

Prompt Injection nlp
1 citations PDF