defense 2026

Rotated Robustness: A Training-Free Defense against Bit-Flip Attacks on Large Language Models

Deng Liu , Song Chen

0 citations

α

Published on arXiv

2603.16382

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Under 50 Progressive Bit Search flips, maintains 43.9% MMLU accuracy (vs 45.2% unattacked) on Llama-2-7B while competing defenses collapse to random guessing; inflates SPFA attack complexity from ~10 bits to >17,000 bits

Rotated Robustness (RoR)

Novel technique introduced


Hardware faults, specifically bit-flips in quantized weights, pose a severe reliability threat to Large Language Models (LLMs), often triggering catastrophic model collapses. We demonstrate that this vulnerability fundamentally stems from the spatial alignment between sensitive weight bits and extreme activation outliers, which causes a single hardware fault to be massively amplified. To address this, we propose Rotated Robustness (RoR), a training-free defense utilizing orthogonal Householder transformations. By applying an orthogonal rotation to the activation space, RoR geometrically smooths extreme outliers across all feature dimensions. This mechanism effectively breaks the alignment between outliers and vulnerable weights, mathematically guaranteeing original model accuracy. Extensive empirical evaluations across Llama-2/3, OPT, and Qwen families demonstrate the superior reliability of our approach. Under random bit-flip attacks, RoR reduces the stochastic collapse rate from 3.15\% to 0.00\% on Qwen2.5-7B. Furthermore, under severe targeted attacks with 50 Progressive Bit Search flips, RoR sustains robust reasoning on Llama-2-7B, maintaining a 43.9\% MMLU accuracy that nearly matches its 45.2\% unattacked accuracy, while competing defenses collapse to random guessing. Most notably, against the Single-Point Fault Attack (SPFA) -- the most aggressive targeted threat -- RoR exponentially inflates the attack complexity from a few bits to over 17,000 precise bit-flips. With a negligible storage overhead of 0.31\% and a minimal inference latency increase of 9.1\% on Llama-2-7B, RoR achieves true lossless robustness, providing a practical and highly reliable defense for LLM deployment.


Key Contributions

  • Rotated Robustness (RoR): orthogonal Householder transformations that geometrically smooth activation outliers to break alignment with vulnerable weight bits
  • Training-free defense with mathematically guaranteed accuracy preservation and exponential attack complexity inflation (17,000+ bits for SPFA)
  • 0.31% storage overhead and 9.1% latency increase while reducing stochastic collapse rate from 3.15% to 0.00% on Qwen2.5-7B

🛡️ Threat Analysis

Model Poisoning

Bit-flip attacks on quantized weights are a form of model poisoning — directly manipulating model parameters to cause targeted malicious behavior (catastrophic collapse or wrong outputs). The paper defends against targeted bit-flip attacks (Progressive Bit Search, Single-Point Fault Attack) that inject hidden malicious behavior into the model by flipping specific weight bits.


Details

Domains
nlp
Model Types
llmtransformer
Threat Tags
inference_timetargeted
Datasets
MMLU
Applications
language modelingreasoning tasks