attack arXiv Dec 18, 2025 · Dec 2025
Lulu Xue, Shengshan Hu, Linqiang Qian et al. · Huazhong University of Science and Technology · Tsinghua University +4 more
Novel black-box MIA exploits dual-model access after unlearning to infer membership of retained data via likelihood ratio inference
Membership Inference Attack vision
Machine unlearning is a newly popularized technique for removing specific training data from a trained model, enabling it to comply with data deletion requests. While it protects the rights of users requesting unlearning, it also introduces new privacy risks. Prior works have primarily focused on the privacy of data that has been unlearned, while the risks to retained data remain largely unexplored. To address this gap, we focus on the privacy risks of retained data and, for the first time, reveal the vulnerabilities introduced by machine unlearning under the dual-view setting, where an adversary can query both the original and the unlearned models. From an information-theoretic perspective, we introduce the concept of {privacy knowledge gain} and demonstrate that the dual-view setting allows adversaries to obtain more information than querying either model alone, thereby amplifying privacy leakage. To effectively demonstrate this threat, we propose DVIA, a Dual-View Inference Attack, which extracts membership information on retained data using black-box queries to both models. DVIA eliminates the need to train an attack model and employs a lightweight likelihood ratio inference module for efficient inference. Experiments across different datasets and model architectures validate the effectiveness of DVIA and highlight the privacy risks inherent in the dual-view setting.
cnn transformer Huazhong University of Science and Technology · Tsinghua University · Nanyang Technological University +3 more
attack arXiv Feb 11, 2026 · 7w ago
Shuyu Chang, Haiping Huang, Yanjun Zhang et al. · Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence +5 more
Backdoor attack on code models using sharpness-aware training and Gumbel-Softmax triggers for cross-dataset transferability and stealthiness
Model Poisoning nlp
Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.
transformer Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence · Jiangsu Provincial Key Laboratory of Internet of Things Intelligent Perception and Computing +4 more
attack arXiv Jan 17, 2026 · 11w ago
Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang et al. · Griffith University · University of Technology Sydney +1 more
Adversarial attack exploits visual token compression in VLMs by perturbing token importance rankings, causing failures only under compressed inference
Input Manipulation Attack Prompt Injection visionnlpmultimodal
Visual token compression is widely adopted to improve the inference efficiency of Large Vision-Language Models (LVLMs), enabling their deployment in latency-sensitive and resource-constrained scenarios. However, existing work has mainly focused on efficiency and performance, while the security implications of visual token compression remain largely unexplored. In this work, we first reveal that visual token compression substantially degrades the robustness of LVLMs: models that are robust under uncompressed inference become highly vulnerable once compression is enabled. These vulnerabilities are state-specific; failure modes emerge only in the compressed setting and completely disappear when compression is disabled, making them particularly hidden and difficult to diagnose. By analyzing the key stages of the compression process, we identify instability in token importance ranking as the primary cause of this robustness degradation. Small and imperceptible perturbations can significantly alter token rankings, leading the compression mechanism to mistakenly discard task-critical information and ultimately causing model failure. Motivated by this observation, we propose a Compression-Aware Attack to systematically study and exploit this vulnerability. CAA directly targets the token selection mechanism and induces failures exclusively under compressed inference. We further extend this approach to more realistic black-box settings and introduce Transfer CAA, where neither the target model nor the compression configuration is accessible. We further evaluate potential defenses and find that they provide only limited protection. Extensive experiments across models, datasets, and compression methods show that visual token compression significantly undermines robustness, revealing a previously overlooked efficiency-security trade-off.
vlm transformer Griffith University · University of Technology Sydney · University of Utah
defense arXiv Jan 28, 2026 · 9w ago
Lulu Xue, Shengshan Hu, Wei Lu et al. · Huazhong University of Science and Technology · Institute of Guizhou Aerospace Measuring and Testing Technology +2 more
Defends machine unlearning against inversion attacks that reconstruct erased training data via cosine-space perturbations
Model Inversion Attack vision
Machine unlearning is an emerging technique that aims to remove the influence of specific data from trained models, thereby enhancing privacy protection. However, recent research has uncovered critical privacy vulnerabilities, showing that adversaries can exploit unlearning inversion to reconstruct data that was intended to be erased. Despite the severity of this threat, dedicated defenses remain lacking. To address this gap, we propose UnlearnShield, the first defense specifically tailored to counter unlearning inversion. UnlearnShield introduces directional perturbations in the cosine representation space and regulates them through a constraint module to jointly preserve model accuracy and forgetting efficacy, thereby reducing inversion risk while maintaining utility. Experiments demonstrate that it achieves a good trade-off among privacy protection, accuracy, and forgetting.
cnn transformer Huazhong University of Science and Technology · Institute of Guizhou Aerospace Measuring and Testing Technology · University of Technology Sydney +1 more