attack 2026

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Jinwei Hu ¹, Xinmiao Huang ¹, Youcheng Sun ², Yi Dong ¹, Xiaowei Huang ¹

¹ University of Liverpool

² Mohamed bin Zayed University of Artificial Intelligence

0 citations · 49 references · arXiv

Published on arXiv

2601.01685

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Achieves 74.4% attack success on proprietary LLMs and 70.6% on open-weights models, with stronger reasoning capabilities counterintuitively increasing susceptibility to the attack.

Generative Montage

Novel technique introduced

As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.

Key Contributions

Formalizes the first cognitive collusion attack, exploiting LLMs' overthinking tendency to steer victim agents toward false beliefs using only truthful evidence fragments via public channels
Proposes Generative Montage, a Writer-Editor-Director multi-agent framework that constructs maximally deceptive narrative sequences through adversarial debate and coordinated posting
Introduces CoPHEME, a dataset derived from real-world rumor events, and demonstrates 74.4% attack success on proprietary models with cascading deception exceeding 60% on downstream judges

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeted

Datasets

CoPHEME

Applications

llm agentsmulti-agent information synthesisrumor verification systems

Read PDF arXiv DOI Code

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Persona Jailbreaking in Large Language Models

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression

Emoji-Based Jailbreaking of Large Language Models

PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming

Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions

Let the Bees Find the Weak Spots: A Path Planning Perspective on Multi-Turn Jailbreak Attacks against LLMs