attack 2025

SASER: Stego attacks on open-source LLMs

Ming Tan ¹, Wei Li ², Hu Tao ¹, Hailong Ma ¹, Aodi Liu ¹, Qian Chen ², Zilong Wang ²

¹ Information Engineering University

² Xidian University

0 citations · 41 references · arXiv

Published on arXiv

2510.10486

Model Poisoning

OWASP ML Top 10 — ML10

Key Finding

SASER achieves 100% attack success rate on both quantized and non-quantized LLaMA2-7B and ChatGLM3-6B while outperforming existing DNN stego attacks in stealth rate by up to 98.1%

SASER

Novel technique introduced

Open-source large language models (LLMs) have demonstrated considerable dominance over proprietary LLMs in resolving neural processing tasks, thanks to the collaborative and sharing nature. Although full access to source codes, model parameters, and training data lays the groundwork for transparency, we argue that such a full-access manner is vulnerable to stego attacks, and their ill-effects are not fully understood. In this paper, we conduct a systematic formalization for stego attacks on open-source LLMs by enumerating all possible threat models associated with adversary objectives, knowledge, and capabilities. Therein, the threat posed by adversaries with internal knowledge, who inject payloads and triggers during the model sharing phase, is of practical interest. We go even further and propose the first stego attack on open-source LLMs, dubbed SASER, which wields impacts through identifying targeted parameters, embedding payloads, injecting triggers, and executing payloads sequentially. Particularly, SASER enhances the attack robustness against quantization-based local deployment by de-quantizing the embedded payloads. In addition, to achieve stealthiness, SASER devises the performance-aware importance metric to identify targeted parameters with the least degradation of model performance. Extensive experiments on LlaMA2-7B and ChatGLM3-6B, without quantization, show that the stealth rate of SASER outperforms existing stego attacks (for general DNNs) by up to 98.1%, while achieving the same attack success rate (ASR) of 100%. More importantly, SASER improves ASR on quantized models from 0 to 100% in all settings. We appeal for investigations on countermeasures against SASER in view of the significant attack effectiveness.

Key Contributions

First systematic formalization of stego attack threat models for open-source LLMs, covering adversary objectives, knowledge, and capabilities
SASER attack method: performance-aware importance metric identifies target parameters for payload embedding with minimal model degradation, achieving 98.1% higher stealth rate than prior DNN stego attacks at 100% ASR
De-quantization technique that recovers embedded payloads after quantization, improving ASR on quantized LLMs from 0% to 100% across all settings

🛡️ Threat Analysis

Model Poisoning

SASER is a backdoor/trojan attack: it conceals trigger-activated malicious payloads within LLM model parameters (weight-space steganography). The model behaves normally until a crafted trigger activates the embedded payload. The primary contribution is the payload-embedding and trigger-injection technique in model weights — supply chain distribution via model hubs is the threat context, not the primary contribution, per the ML06 exclusion rule for weight manipulation attacks.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

white_boxtraining_timetargeted

Datasets

LLaMA2-7BChatGLM3-6B

Applications

open-source llm deploymentmodel sharing platforms

Read PDF arXiv DOI

SASER: Stego attacks on open-source LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Trigger Where It Hurts: Unveiling Hidden Backdoors through Sensitivity with Sensitron

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Adversarial Contrastive Learning for LLM Quantization Attacks

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities

Inverting Trojans in LLMs

Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs