attack 2025

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Elias Hossain ¹, Swayamjit Saha ², Somshubhra Roy ³, Ravi Prasad ²

¹ University of Central Florida

² Mississippi State University

³ North Carolina State University

2 citations · 33 references · arXiv

Published on arXiv

2510.17098

Input Manipulation Attack

OWASP ML Top 10 — ML01

Key Finding

Controlled KV cache corruption via additive Gaussian noise and orthogonal rotations induces measurable distributional shifts and downstream task failure in GPT-2 and LLaMA-2/7B while bypassing prompt-level and parameter-level defenses

Malicious Token Injection (MTI)

Novel technique introduced

Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.

Key Contributions

Formalizes KV cache corruption as a principled inference-time threat model (MTI framework) applying Gaussian noise, zeroing, and orthogonal rotations to cached key vectors at controllable layers and timesteps
Theoretical bounds quantifying how norm-limited cache perturbations propagate through attention to output logits via Frobenius norm analysis and softmax Lipschitz dynamics
Empirical demonstration of distributional shift, miscalibration, and task failure across GPT-2, LLaMA-2/7B on NLP benchmarks, RAG, and agentic reasoning pipelines, plus evaluation of lightweight cache defenses

🛡️ Threat Analysis

Input Manipulation Attack

MTI is an inference-time attack that perturbs cached key vectors (Gaussian noise, zeroing, orthogonal rotations) to cause distributional shifts and task failures. While the attack vector is internal model state rather than crafted inputs, it operates entirely at inference time and produces adversarially corrupted outputs — placing it closest to ML01's inference-time output manipulation threat model. The attack bypasses input-level filters, making it a novel evasion-class threat.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxinference_time

Datasets

GPT-2 synthetic promptsLLaMA-2/7B NLP benchmarks

Applications

language modelingretrieval-augmented generationagentic reasoning pipelines

Read PDF arXiv DOI

Can Transformer Memory Be Corrupted? Investigating Cache-Side Vulnerabilities in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation

Rerouting LLM Routers

Hacking Neural Evaluation Metrics with Single Hub Text

Provable Adversarial Robustness in In-Context Learning

Semantics-Preserving Evasion of LLM Vulnerability Detectors

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Text Adversarial Attacks with Dynamic Outputs

LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems