attack 2025

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Yu Yan ¹, Siqi Lu ¹, Yang Gao ¹, Zhaoxuan Li ², Ziming Zhao ³, Qingjun Yuan ¹, Yongjuan Wang ¹

¹ Information Engineering University

² Chinese Academy of Sciences

³ Zhejiang University

0 citations · 33 references · arXiv

Published on arXiv

2510.00490

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

A single remotely-induced bit flip in attention or output-layer weights of quantized LLMs (.gguf) drops accuracy from 73.5% to 0% and can cause the model to autonomously generate harmful outputs like 'humans should be exterminated' in response to ordinary queries

BitSifter

Novel technique introduced

Recently, Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware fault injection. With the widespread distillation and deployment of large language models (LLMs) into single file .gguf formats, their weight spaces have become exposed to an unprecedented hardware attack surface. This paper is the first to systematically discover and validate the existence of single-bit vulnerabilities in LLM weight files: in mainstream open-source models (e.g., DeepSeek and QWEN) using .gguf quantized formats, flipping just single bit can induce three types of targeted semantic level failures Artificial Flawed Intelligence (outputting factual errors), Artificial Weak Intelligence (degradation of logical reasoning capability), and Artificial Bad Intelligence (generating harmful content). By building an information theoretic weight sensitivity entropy model and a probabilistic heuristic scanning framework called BitSifter, we achieved efficient localization of critical vulnerable bits in models with hundreds of millions of parameters. Experiments show that vulnerabilities are significantly concentrated in the tensor data region, particularly in areas related to the attention mechanism and output layers, which are the most sensitive. A negative correlation was observed between model size and robustness, with smaller models being more susceptible to attacks. Furthermore, a remote BFA chain was designed, enabling semantic-level attacks in real-world environments: At an attack frequency of 464.3 times per second, a single bit can be flipped with 100% success in as little as 31.7 seconds. This causes the accuracy of LLM to plummet from 73.5% to 0%, without requiring high-cost equipment or complex prompt engineering.

Key Contributions

First systematic identification and validation of single-bit vulnerabilities in LLM .gguf quantized weight files, demonstrating three distinct semantic failure modes: Artificial Flawed Intelligence (factual errors), Artificial Weak Intelligence (reasoning degradation), and Artificial Bad Intelligence (harmful content generation)
BitSifter: an information-theoretic weight sensitivity entropy model combined with a three-stage probabilistic heuristic framework that efficiently localizes critical vulnerable bits among hundreds of millions of parameters
End-to-end remote Rowhammer BFA chain achieving 100% bit-flip success in as little as 31.7 seconds at 464.3 flips/second, causing LLM accuracy to plummet from 73.5% to 0% without specialized hardware or prompt engineering

🛡️ Threat Analysis

Model Poisoning

The attack directly manipulates LLM model weight parameters through hardware fault injection (Rowhammer), embedding targeted malicious behavior — the model generates harmful content or suffers accuracy collapse after a single critical bit is flipped. This is runtime weight tampering that induces targeted semantic failures, mapping squarely to model poisoning via parameter modification even though the mechanism is hardware-level rather than training-time. The paper explicitly excludes supply-chain distribution as the attack vector; it is direct bit-level weight manipulation.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxinference_timetargeteddigital

Datasets

DeepSeekQWEN

Applications

large language model deploymentedge ai inferencequantized llm serving

Read PDF arXiv DOI

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

ShadowLogic: Backdoors in Any Whitebox LLM

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

TFL: Targeted Bit-Flip Attack on Large Language Model

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Adversarial Contrastive Learning for LLM Quantization Attacks

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities