attack 2025

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Sanjay Das 1, Swastik Bhattacharya 1, Shamik Kundu 2, Arnab Raha 2, Souvik Kundu 2, Kanad Basu 3

0 citations · 34 references · arXiv

α

Published on arXiv

2512.15778

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Flipping a single critical bit in Mamba-1.4b reduces LAMBADA accuracy from 74.64% to 0% and increases perplexity from 18.94 to 3.75×10^6, demonstrating extreme SSM fragility to BFAs.

COBRA (RAMBO)

Novel technique introduced


State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering linear-time scalability together with strong performance in long-context settings. Owing to their unique combination of efficiency, scalability, and expressive capacity, SSMs have become compelling alternatives to transformer-based models, which suffer from the quadratic computational and memory costs of attention mechanisms. As SSMs are increasingly deployed in real-world applications, it is critical to assess their susceptibility to both software- and hardware-level threats to ensure secure and reliable operation. Among such threats, hardware-induced bit-flip attacks (BFAs) pose a particularly severe risk by corrupting model parameters through memory faults, thereby undermining model accuracy and functional integrity. To investigate this vulnerability, we introduce RAMBO, the first BFA framework specifically designed to target Mamba-based architectures. Through experiments on the Mamba-1.4b model with LAMBADA benchmark, a cloze-style word-prediction task, we demonstrate that flipping merely a single critical bit can catastrophically reduce accuracy from 74.64% to 0% and increase perplexity from 18.94 to 3.75 x 10^6. These results demonstrate the pronounced fragility of SSMs to adversarial perturbations.


Key Contributions

  • COBRA/RAMBO: the first bit-flip attack (BFA) framework specifically designed to target Mamba-based SSM architectures, identifying critical weight bits to flip
  • Empirical demonstration that flipping a single bit in Mamba-1.4b catastrophically reduces accuracy from 74.64% to 0% and inflates perplexity from 18.94 to 3.75×10^6 on LAMBADA
  • Analysis of pronounced architectural fragility in SSMs to adversarial hardware-level parameter corruption, motivating fault-aware design for SSM deployments

🛡️ Threat Analysis

Model Poisoning

BFAs corrupt model weight parameters post-deployment to catastrophically degrade model accuracy — a form of model parameter poisoning/tampering. While distinct from classical backdoor/trojan attacks (no hidden trigger), this is the closest OWASP category as the attack directly corrupts model parameters to undermine functional integrity, analogous to model poisoning. The framework specifically identifies and targets critical weight bits to maximize model degradation.


Details

Domains
nlp
Model Types
llm
Threat Tags
white_boxinference_timetargeteddigital
Datasets
LAMBADA
Applications
language modelingsequence modeling