attack 2025

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Elias Hossain 1, Swayamjit Saha 2, Somshubhra Roy 3, Ravi Prasad 2

2 citations · 33 references · arXiv

α

Published on arXiv

2510.17098

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Controlled KV cache corruption via additive Gaussian noise and orthogonal rotations induces measurable distributional shifts and downstream task failure in GPT-2 and LLaMA-2/7B while bypassing prompt-level and parameter-level defenses

Malicious Token Injection (MTI)

Novel technique introduced


Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.


Key Contributions

  • Formalizes KV cache corruption as a principled inference-time threat model (MTI framework) applying Gaussian noise, zeroing, and orthogonal rotations to cached key vectors at controllable layers and timesteps
  • Theoretical bounds quantifying how norm-limited cache perturbations propagate through attention to output logits via Frobenius norm analysis and softmax Lipschitz dynamics
  • Empirical demonstration of distributional shift, miscalibration, and task failure across GPT-2, LLaMA-2/7B on NLP benchmarks, RAG, and agentic reasoning pipelines, plus evaluation of lightweight cache defenses

🛡️ Threat Analysis

Input Manipulation Attack

MTI is an inference-time attack that perturbs cached key vectors (Gaussian noise, zeroing, orthogonal rotations) to cause distributional shifts and task failures. While the attack vector is internal model state rather than crafted inputs, it operates entirely at inference time and produces adversarially corrupted outputs — placing it closest to ML01's inference-time output manipulation threat model. The attack bypasses input-level filters, making it a novel evasion-class threat.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_time
Datasets
GPT-2 synthetic promptsLLaMA-2/7B NLP benchmarks
Applications
language modelingretrieval-augmented generationagentic reasoning pipelines