attack 2026

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

0 citations

Published on arXiv

2604.22117

Data Poisoning Attack

OWASP ML Top 10 — ML02

Model Poisoning

OWASP ML Top 10 — ML10

AI Supply Chain Attacks

OWASP ML Top 10 — ML06

Training Data Poisoning

OWASP LLM Top 10 — LLM03

Key Finding

Successfully embeds persistent unsafe behavior in LLMs that evades alignment defenses and remains dormant until activated by specific triggers across multiple model families

PermaFrost-Attack (Stealth Pretraining Seeding)

Novel technique introduced

Aligned large language models(LLMs) remain vulnerable to adversarial manipulation, and their dependence on web-scale pretraining creates a subtle but serious attack surface. We study Stealth Pretraining Seeding (SPS), a new attack family in which adversaries distribute small amounts of poisoned content across stealth websites, expose them to web crawlers through robots.txt, and thereby increase the likelihood that such content is absorbed into future training corpora derived from sources such as Common Crawl. Because each individual payload is tiny, diffuse, and superficially benign, the attack is difficult to detect during dataset construction or filtering. The result is a latent form of poisoning: dormant logic landmines embedded during pretraining that remain largely invisible under standard evaluation, yet can later be activated by precise alphanumeric triggers such as <00TRIGGER00> to bypass safeguards. We call this attack PermaFrost, by analogy to Arctic permafrost: harmful material can remain frozen, buried, and unnoticed for long periods, only to resurface when conditions allow. We operationalize this threat through PermaFrost-Attack, a controlled framework for latent conceptual poisoning, together with a suite of geometric diagnostics: Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph. Across multiple model families and scales, we show that SPS is broadly effective, inducing persistent unsafe behavior while often evading alignment defenses. Our results identify SPS as a practical and underappreciated threat to future foundation models. This paper introduces a novel geometric diagnostic lens for systematically examining latent model behavior, providing a principled foundation for detecting, characterizing, and understanding vulnerabilities that may remain invisible to standard evaluation.

Key Contributions

Introduces Stealth Pretraining Seeding (SPS) attack framework for poisoning web-scale training corpora via diffuse, hard-to-detect payloads
Demonstrates latent backdoor triggers embedded during pretraining can remain dormant through alignment yet activate on precise alphanumeric triggers
Proposes three geometric diagnostics (Thermodynamic Length, Spectral Curvature, Infection Traceback Graph) for detecting latent poisoning behavior

🛡️ Threat Analysis

Data Poisoning Attack

Corrupts training data at scale by injecting poisoned content into web crawl corpora during pretraining phase.

AI Supply Chain Attacks

Exploits the LLM supply chain by poisoning publicly accessible training data sources (Common Crawl) before they are ingested into foundation model training pipelines.

Model Poisoning

Embeds latent backdoor triggers (e.g., <00TRIGGER00>) that remain dormant during normal use but activate to bypass safety guardrails when triggered.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

training_timetargeted

Datasets

Common Crawl

Applications

foundation model pretrainingweb-scale language modelingsafety-aligned chatbots

Read PDF arXiv Code

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

The 'Sure' Trap: Multi-Scale Poisoning Analysis of Stealthy Compliance-Only Backdoors in Fine-Tuned Large Language Models

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

On The Dangers of Poisoned LLMs In Security Automation

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers