attack 2025

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS

Bingyu Yan ¹, Ziyi Zhou ¹, Xiaoming Zhang ¹, Chaozhuo Li ¹, Ruilin Zeng ¹, Yirui Qi ¹, Tianbo Wang ¹, Litian Zhang ²

¹ Beihang University

² Beijing University of Posts and Telecommunications

0 citations

Published on arXiv

2508.03125

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

MAST consistently achieves high attack success rates while significantly enhancing stealthiness compared to baseline methods across diverse LLM-MAS communication architectures

MAST

Novel technique introduced

Large language model-based multi-agent systems (LLM-MAS) effectively accomplish complex and dynamic tasks through inter-agent communication, but this reliance introduces substantial safety vulnerabilities. Existing attack methods targeting LLM-MAS either compromise agent internals or rely on direct and overt persuasion, which limit their effectiveness, adaptability, and stealthiness. In this paper, we propose MAST, a Multi-round Adaptive Stealthy Tampering framework designed to exploit communication vulnerabilities within the system. MAST integrates Monte Carlo Tree Search with Direct Preference Optimization to train an attack policy model that adaptively generates effective multi-round tampering strategies. Furthermore, to preserve stealthiness, we impose dual semantic and embedding similarity constraints during the tampering process. Comprehensive experiments across diverse tasks, communication architectures, and LLMs demonstrate that MAST consistently achieves high attack success rates while significantly enhancing stealthiness compared to baselines. These findings highlight the effectiveness, stealthiness, and adaptability of MAST, underscoring the need for robust communication safeguards in LLM-MAS.

Key Contributions

MAST framework combining Monte Carlo Tree Search with Direct Preference Optimization to train an attack policy that generates adaptive multi-round message tampering strategies
Dual semantic and embedding similarity constraints that preserve stealthiness of tampered messages during the attack
Comprehensive evaluation across diverse tasks, communication architectures, and LLMs demonstrating consistent high attack success rates with improved stealthiness over baselines

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Applications

llm multi-agent systemsmulti-agent communication

Read PDF arXiv

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

MURMUR: Using cross-user chatter to break collaborative language agents in groups

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections