attack 2025

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

Haotian Xu 1,2, Qingsong Peng 1,2, Jie Shi 2, Huadi Zheng 2, Yu Li 1,2, Zhuoyang Chen 1

1 citations · 33 references · arXiv

α

Published on arXiv

2509.17371

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

On LLaMA-3.1-8B-Instruct (INT8), 50 bit flips drop GSM8K accuracy from 65.7% to 7.6% while naturalness score falls only from 66.0 to 61.1, versus GenBFA which reaches 0% accuracy but causes complete output collapse (naturalness 0, perplexity 5.5×10⁵)

SilentStriker

Novel technique introduced


The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e.g., prompt injection) have been well studied, Bit-Flip Attacks (BFAs) -- which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation -- have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which inevitably degrade output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.


Key Contributions

  • First stealthy bit-flip attack against LLMs (SilentStriker) that degrades task performance while preserving output naturalness, achieved by flipping ~50 bits out of billions of parameters
  • Token-based loss formulation that suppresses critical output tokens while constraining perplexity, enabling joint optimization of attack effectiveness and stealthiness without the naturalness collapse seen in prior BFA methods
  • Iterative progressive search strategy for identifying optimal attack locations, plus an improved bit-selection strategy for FP4-quantized models using LUT mapping

🛡️ Threat Analysis

Model Poisoning

SilentStriker corrupts deployed LLM model parameters by flipping bits in DRAM, directly poisoning model weights to induce severe performance degradation — a form of model poisoning targeting the model's internal parameters rather than training data or inputs.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
white_boxinference_timetargeteddigital
Datasets
GSM8K
Applications
large language modelsinstruction-followingmathematical reasoning