Latest papers

2 papers
benchmark arXiv Jan 6, 2026 · Jan 2026

The Anatomy of Conversational Scams: A Topic-Based Red Teaming Analysis of Multi-Turn Interactions in LLMs

Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang et al. · University of Iowa · City University of Hong Kong

Red-teams eight LLMs as conversational scam attackers and victims across 18,648 multi-turn dialogues to map safety failure modes

Prompt Injection nlp
PDF
attack arXiv Oct 15, 2025 · Oct 2025

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

Ziqing Lu, Lifeng Lai, Weiyu Xu · University of Iowa · University of California

Proposes provably undefeatable RL training-time poisoning attack using rate-distortion theory to randomize transition kernel observations, guaranteeing victim regret via information-theoretic bounds

Data Poisoning Attack reinforcement-learning
PDF