attack 2025

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

Haotian Xu ^1,2, Qingsong Peng ^1,2, Jie Shi ², Huadi Zheng ², Yu Li ^1,2, Zhuoyang Chen ¹

¹ Zhejiang University

² Huawei

1 citations · 33 references · arXiv

Published on arXiv

2509.17371

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

On LLaMA-3.1-8B-Instruct (INT8), 50 bit flips drop GSM8K accuracy from 65.7% to 7.6% while naturalness score falls only from 66.0 to 61.1, versus GenBFA which reaches 0% accuracy but causes complete output collapse (naturalness 0, perplexity 5.5×10⁵)

SilentStriker

Novel technique introduced

The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e.g., prompt injection) have been well studied, Bit-Flip Attacks (BFAs) -- which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation -- have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which inevitably degrade output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.

Key Contributions

First stealthy bit-flip attack against LLMs (SilentStriker) that degrades task performance while preserving output naturalness, achieved by flipping ~50 bits out of billions of parameters
Token-based loss formulation that suppresses critical output tokens while constraining perplexity, enabling joint optimization of attack effectiveness and stealthiness without the naturalness collapse seen in prior BFA methods
Iterative progressive search strategy for identifying optimal attack locations, plus an improved bit-selection strategy for FP4-quantized models using LUT mapping

🛡️ Threat Analysis

Model Poisoning

SilentStriker corrupts deployed LLM model parameters by flipping bits in DRAM, directly poisoning model weights to induce severe performance degradation — a form of model poisoning targeting the model's internal parameters rather than training data or inputs.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxinference_timetargeteddigital

Datasets

GSM8K

Applications

large language modelsinstruction-followingmathematical reasoning

Read PDF arXiv DOI

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

ShadowLogic: Backdoors in Any Whitebox LLM

TFL: Targeted Bit-Flip Attack on Large Language Model

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Adversarial Contrastive Learning for LLM Quantization Attacks

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities