Incentive-Aligned Multi-Source LLM Summaries
Yanchen Jiang 1,2, Zhe Feng 2, Aranyak Mehta 2
Published on arXiv
2509.25184
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
TTS improves factual accuracy and robustness against adversarial/strategic content injection while preserving fluency, and formal guarantees show truthful reporting is the utility-maximizing source strategy under the peer-prediction scoring mechanism.
Truthful Text Summarization (TTS)
Novel technique introduced
Large language models (LLMs) are increasingly used in modern search and answer systems to synthesize multiple, sometimes conflicting, texts into a single response, yet current pipelines offer weak incentives for sources to be accurate and are vulnerable to adversarial content. We introduce Truthful Text Summarization (TTS), an incentive-aligned framework that improves factual robustness without ground-truth labels. TTS (i) decomposes a draft synthesis into atomic claims, (ii) elicits each source's stance on every claim, (iii) scores sources with an adapted multi-task peer-prediction mechanism that rewards informative agreement, and (iv) filters unreliable sources before re-summarizing. We establish formal guarantees that align a source's incentives with informative honesty, making truthful reporting the utility-maximizing strategy. Experiments show that TTS improves factual accuracy and robustness while preserving fluency, aligning exposure with informative corroboration and disincentivizing manipulation.
Key Contributions
- Truthful Text Summarization (TTS) pipeline that decomposes documents into atomic claims and scores sources via a leave-one-out multi-task peer prediction mechanism, preventing adversarial sources from influencing their own evaluation set
- Formal incentive-alignment guarantees (informed and strong truthfulness) showing truthful reporting is the utility-maximizing strategy for sources, with finite-sample convergence bounds
- Empirical demonstration that TTS improves factual accuracy and robustness against hallucinations and adversarial/strategic prompt injection compared to majority-style and LLM-centric baselines