attack arXiv Mar 1, 2026 · 5w ago
Huajie Chen, Tianqing Zhu, Yuchen Zhong et al. · City University of Macau · CISPA Helmholtz Center for Information Security +2 more
Reveals that dataset distillation leaks training data via three-stage attack: architecture inference, membership inference, and model inversion
Model Inversion Attack Membership Inference Attack vision
Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.
cnn diffusion City University of Macau · CISPA Helmholtz Center for Information Security · University of Technology Sydney +1 more
attack arXiv Mar 1, 2026 · 5w ago
Huajie Chen, Tianqing Zhu, Hailin Yang et al. · City University of Macau · CISPA Helmholtz Center for Information Security +1 more
Pixel-wise reconstruction attack removes AI-image watermarks without querying detectors or knowing the watermarking scheme
Output Integrity Attack visiongenerative
Watermarking has emerged as a key defense against the misuse of machine-generated images (MGIs). Yet the robustness of these protections remains underexplored. To reveal the limits of SOTA proactive image watermarking defenses, we propose HIDE&SEEK (HS), a suite of versatile and cost-effective attacks that reliably remove embedded watermarks while preserving high visual fidelity.
diffusion gan City University of Macau · CISPA Helmholtz Center for Information Security · Guangzhou University
attack arXiv Mar 5, 2026 · 4w ago
Yuchen Shi, Huajie Chen, Heng Xu et al. · City University of Macau · Jinan University +1 more
Poisons distilled synthetic datasets to embed hidden hijacking tasks in models fine-tuned via transfer learning
Data Poisoning Attack Transfer Learning Attack vision
Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
cnn transformer City University of Macau · Jinan University · The University of Sydney
defense arXiv Mar 4, 2026 · 4w ago
Yizhe Xie, Congcong Zhu, Xinyue Zhang et al. · City University of Macau · Minzu University of China
Models and defends against injected error-seed cascades in LLM multi-agent systems via genealogy-graph message governance
Prompt Injection Excessive Agency nlp
Large Language Model-based Multi-Agent Systems (LLM-MAS) are increasingly applied to complex collaborative scenarios. However, their collaborative mechanisms may cause minor inaccuracies to gradually solidify into system-level false consensus through iteration. Such risks are difficult to trace since errors can propagate and amplify through message dependencies. Existing protections often rely on single-agent validation or require modifications to the collaboration architecture, which can weaken effective information flow and may not align with natural collaboration processes in real tasks. To address this, we propose a propagation dynamics model tailored for LLM-MAS that abstracts collaboration as a directed dependency graph and provides an early-stage risk criterion to characterize amplification risk. Through experiments on six mainstream frameworks, we identify three vulnerability classes: cascade amplification, topological sensitivity, and consensus inertia. We further instantiate an attack where injecting just a single atomic error seed leads to widespread failure. In response, we introduce a genealogy-graph-based governance layer, implemented as a message-layer plugin, that suppresses both endogenous and exogenous error amplification without altering the collaboration architecture. Experiments show that this approach raises the defense success rate from a baseline of 0.32 to over 0.89 and significantly mitigates the cascading spread of minor errors.
llm transformer City University of Macau · Minzu University of China