SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models
Haotian Xu 1,2, Qingsong Peng 1,2, Jie Shi 2, Huadi Zheng 2, Yu Li 1,2, Zhuoyang Chen 1
Published on arXiv
2509.17371
Model Poisoning
OWASP ML Top 10 — ML10
Key Finding
On LLaMA-3.1-8B-Instruct (INT8), 50 bit flips drop GSM8K accuracy from 65.7% to 7.6% while naturalness score falls only from 66.0 to 61.1, versus GenBFA which reaches 0% accuracy but causes complete output collapse (naturalness 0, perplexity 5.5×10⁵)
SilentStriker
Novel technique introduced
The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e.g., prompt injection) have been well studied, Bit-Flip Attacks (BFAs) -- which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation -- have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which inevitably degrade output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.
Key Contributions
- First stealthy bit-flip attack against LLMs (SilentStriker) that degrades task performance while preserving output naturalness, achieved by flipping ~50 bits out of billions of parameters
- Token-based loss formulation that suppresses critical output tokens while constraining perplexity, enabling joint optimization of attack effectiveness and stealthiness without the naturalness collapse seen in prior BFA methods
- Iterative progressive search strategy for identifying optimal attack locations, plus an improved bit-selection strategy for FP4-quantized models using LUT mapping
🛡️ Threat Analysis
SilentStriker corrupts deployed LLM model parameters by flipping bits in DRAM, directly poisoning model weights to induce severe performance degradation — a form of model poisoning targeting the model's internal parameters rather than training data or inputs.