Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs
Ahmed Salem 1, Andrew Paverd 1, Sahar Abdelnabi 2,3
Published on arXiv
2602.08563
Model Poisoning
OWASP ML Top 10 — ML10
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
LLMs can reliably encode hidden state in their outputs and recover it upon reingestion, enabling temporal backdoors inducible via simple prompting or fine-tuning in current models.
Time Bombs (Implicit Memory)
Novel technique introduced
Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is explicitly stored and re-supplied. We challenge this assumption by introducing implicit memory-the ability of a model to carry state across otherwise independent interactions by encoding information in its own outputs and later recovering it when those outputs are reintroduced as input. This mechanism does not require any explicit memory module, yet it creates a persistent information channel across inference requests. As a concrete demonstration, we introduce a new class of temporal backdoors, which we call time bombs. Unlike conventional backdoors that activate on a single trigger input, time bombs activate only after a sequence of interactions satisfies hidden conditions accumulated via implicit memory. We show that such behavior can be induced today through straightforward prompting or fine-tuning. Beyond this case study, we analyze broader implications of implicit memory, including covert inter-agent communication, benchmark contamination, targeted manipulation, and training-data poisoning. Finally, we discuss detection challenges and outline directions for stress-testing and evaluation, with the goal of anticipating and controlling future developments. To promote future research, we release code and data at: https://github.com/microsoft/implicitMemory.
Key Contributions
- Defines implicit memory — an LLM's ability to carry state across independent interactions by encoding it in outputs that are later reingested, without any explicit memory module.
- Introduces time bombs — temporal backdoors that activate only after a hidden sequence of conditions is accumulated via implicit memory, demonstrable through prompting or fine-tuning.
- Analyzes broader attack surface enabled by implicit memory: covert inter-agent communication, benchmark contamination, targeted manipulation, and training-data poisoning.
🛡️ Threat Analysis
Time bombs are explicitly a new class of temporal backdoors — they accumulate hidden trigger conditions across interactions via implicit memory and activate only when a sequence of interactions satisfies those conditions, inducible via prompting or fine-tuning.