attack 2025

An Investigation on Group Query Hallucination Attacks

Kehao Miao 1, Xiaolong Jin 2

0 citations

α

Published on arXiv

2508.19321

Model Poisoning

OWASP ML Top 10 — ML10

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Presenting groups of consecutive queries to LLMs significantly degrades performance on fine-tuned task-specific models and can activate pre-implanted backdoor triggers without any gradient access.

Group Query Attack

Novel technique introduced


With the widespread use of large language models (LLMs), understanding their potential failure modes during user interactions is essential. In practice, users often pose multiple questions in a single conversation with LLMs. Therefore, in this study, we propose Group Query Attack, a technique that simulates this scenario by presenting groups of queries to LLMs simultaneously. We investigate how the accumulated context from consecutive prompts influences the outputs of LLMs. Specifically, we observe that Group Query Attack significantly degrades the performance of models fine-tuned on specific tasks. Moreover, we demonstrate that Group Query Attack induces a risk of triggering potential backdoors of LLMs. Besides, Group Query Attack is also effective in tasks involving reasoning, such as mathematical reasoning and code generation for pre-trained and aligned models.


Key Contributions

  • Proposes Group Query Attack (GQA), a black-box prompt-level attack that presents multiple queries simultaneously to exploit accumulated context and significantly degrade fine-tuned LLM performance
  • Demonstrates that GQA can trigger potential pre-implanted backdoors in LLMs, revealing a new activation vector for backdoor exploitation
  • Shows GQA is effective on reasoning tasks (mathematical reasoning, code generation) for pre-trained and aligned models, though less so on multiple-choice and translation tasks

🛡️ Threat Analysis

Model Poisoning

Paper explicitly investigates whether Group Query Attack can trigger potential backdoors embedded in fine-tuned LLMs — models are fine-tuned with backdoors injected, then tested to see if multi-query context activates those triggers, revealing a new backdoor activation vector.


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_time
Applications
question answeringmathematical reasoningcode generation