Yuhui Wang

benchmark arXiv Feb 18, 2026 · 6w ago

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang et al. · Stony Brook University

Benchmark evaluating LLM agent susceptibility to five long-horizon attack types across 28 agentic environments and 644 test cases

Prompt Injection Excessive Agency nlp

1 citations PDF Code

attack arXiv Nov 14, 2025 · Nov 2025

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio

Guangke Chen, Yuhui Wang, Shouling Ji et al. · Stony Brook University · Zhejiang University +1 more

Jailbreaks LALM-based TTS safety alignment via semantic obfuscation and audio-modality injection to generate harmful speech

Prompt Injection audionlpmultimodal

PDF

Modern text-to-speech (TTS) systems, particularly those built on Large Audio-Language Models (LALMs), generate high-fidelity speech that faithfully reproduces input text and mimics specified speaker identities. While prior misuse studies have focused on speaker impersonation, this work explores a distinct content-centric threat: exploiting TTS systems to produce speech containing harmful content. Realizing such threats poses two core challenges: (1) LALM safety alignment frequently rejects harmful prompts, yet existing jailbreak attacks are ill-suited for TTS because these systems are designed to faithfully vocalize any input text, and (2) real-world deployment pipelines often employ input/output filters that block harmful text and audio. We present HARMGEN, a suite of five attacks organized into two families that address these challenges. The first family employs semantic obfuscation techniques (Concat, Shuffle) that conceal harmful content within text. The second leverages audio-modality exploits (Read, Spell, Phoneme) that inject harmful content through auxiliary audio channels while maintaining benign textual prompts. Through evaluation across five commercial LALMs-based TTS systems and three datasets spanning two languages, we demonstrate that our attacks substantially reduce refusal rates and increase the toxicity of generated speech. We further assess both reactive countermeasures deployed by audio-streaming platforms and proactive defenses implemented by TTS providers. Our analysis reveals critical vulnerabilities: deepfake detectors underperform on high-fidelity audio; reactive moderation can be circumvented by adversarial perturbations; while proactive moderation detects 57-93% of attacks. Our work highlights a previously underexplored content-centric misuse vector for TTS and underscore the need for robust cross-modal safeguards throughout training and deployment.

llm multimodal Stony Brook University · Zhejiang University · The Hong Kong Polytechnic University

PDF arXiv DOI

Papers in Database (2)

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio