attack 2025

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Sanjay Das ¹, Swastik Bhattacharya ¹, Shamik Kundu ², Arnab Raha ², Souvik Kundu ², Kanad Basu ³

¹ University of Texas at Dallas

² Intel Corporation

³ Rensselaer Polytechnic Institute

0 citations · 34 references · arXiv

Published on arXiv

2512.15778

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

Flipping a single critical bit in Mamba-1.4b reduces LAMBADA accuracy from 74.64% to 0% and increases perplexity from 18.94 to 3.75×10^6, demonstrating extreme SSM fragility to BFAs.

COBRA (RAMBO)

Novel technique introduced

State-space models (SSMs), exemplified by the Mamba architecture, have recently emerged as state-of-the-art sequence-modeling frameworks, offering linear-time scalability together with strong performance in long-context settings. Owing to their unique combination of efficiency, scalability, and expressive capacity, SSMs have become compelling alternatives to transformer-based models, which suffer from the quadratic computational and memory costs of attention mechanisms. As SSMs are increasingly deployed in real-world applications, it is critical to assess their susceptibility to both software- and hardware-level threats to ensure secure and reliable operation. Among such threats, hardware-induced bit-flip attacks (BFAs) pose a particularly severe risk by corrupting model parameters through memory faults, thereby undermining model accuracy and functional integrity. To investigate this vulnerability, we introduce RAMBO, the first BFA framework specifically designed to target Mamba-based architectures. Through experiments on the Mamba-1.4b model with LAMBADA benchmark, a cloze-style word-prediction task, we demonstrate that flipping merely a single critical bit can catastrophically reduce accuracy from 74.64% to 0% and increase perplexity from 18.94 to 3.75 x 10^6. These results demonstrate the pronounced fragility of SSMs to adversarial perturbations.

Key Contributions

COBRA/RAMBO: the first bit-flip attack (BFA) framework specifically designed to target Mamba-based SSM architectures, identifying critical weight bits to flip
Empirical demonstration that flipping a single bit in Mamba-1.4b catastrophically reduces accuracy from 74.64% to 0% and inflates perplexity from 18.94 to 3.75×10^6 on LAMBADA
Analysis of pronounced architectural fragility in SSMs to adversarial hardware-level parameter corruption, motivating fault-aware design for SSM deployments

🛡️ Threat Analysis

Model Poisoning

BFAs corrupt model weight parameters post-deployment to catastrophically degrade model accuracy — a form of model parameter poisoning/tampering. While distinct from classical backdoor/trojan attacks (no hidden trigger), this is the closest OWASP category as the attack directly corrupts model parameters to undermine functional integrity, analogous to model poisoning. The framework specifically identifies and targets critical weight bits to maximize model degradation.

Details

Domains

nlp

Model Types

llm

Threat Tags

white_boxinference_timetargeteddigital

Datasets

LAMBADA

Applications

language modelingsequence modeling

Read PDF arXiv DOI Code

COBRA: Catastrophic Bit-flip Reliability Analysis of State-Space Models

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ShadowLogic: Backdoors in Any Whitebox LLM

CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights

SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models

Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models

TFL: Targeted Bit-Flip Attack on Large Language Model

Adversarial Contrastive Learning for LLM Quantization Attacks

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities