ML Security Papers

Latest papers

23 papers

defense arXiv Apr 23, 2026 · 28d ago

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen et al. · City University of Macau · University of Technology Sydney

Detects poisoned training samples via early-epoch clustering and neutralizes backdoors by relabeling them to a virtual class

Model Poisoning vision

PDF

attack arXiv Apr 16, 2026 · 5w ago

Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification

Weiwei Zhuang, Wangze Xie, Qi Zhang et al. · Xiamen University of Technology · City University of Macau +8 more

Generates physically plausible fog-based adversarial perturbations for remote sensing classifiers with high transferability and defense robustness

Input Manipulation Attack vision

PDF

attack arXiv Apr 11, 2026 · 5w ago

Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking

Jingru Li, Wei Ren, Tianqing Zhu · China University of Geosciences · City University of Macau

Adversarial attack on VLMs that suppresses attention to safety prompts, achieving 94% jailbreak success via attention manipulation

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

attack arXiv Apr 10, 2026 · 5w ago

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Wenhan Chang, Tianqing Zhu, Ping Xiong et al. · Zhongnan University of Economics and Law · City University of Macau

Backdoor attack embedding triggers in lightweight adapters that hijack LLM reasoning chains to display malicious thought processes

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp

PDF Code

attack arXiv Mar 18, 2026 · 9w ago

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery

Zirui Gong, Leo Yu Zhang, Yanjun Zhang et al. · Griffith University · Swinburne University of Technology +2 more

Gradient inversion attack reconstructing training data from federated learning updates via sparse activation recovery without architectural changes

Model Inversion Attack visionfederated-learning

PDF

attack arXiv Mar 17, 2026 · 9w ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision

PDF

attack arXiv Mar 5, 2026 · 11w ago

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu et al. · City University of Macau · Jinan University +1 more

Poisons distilled synthetic datasets to embed hidden hijacking tasks in models fine-tuned via transfer learning

Data Poisoning Attack Transfer Learning Attack vision

PDF

defense arXiv Mar 4, 2026 · 11w ago

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang et al. · City University of Macau · Minzu University of China

Models and defends against injected error-seed cascades in LLM multi-agent systems via genealogy-graph message governance

Prompt Injection Excessive Agency nlp

PDF Code

attack arXiv Mar 1, 2026 · 11w ago

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

Huajie Chen, Tianqing Zhu, Hailin Yang et al. · City University of Macau · CISPA Helmholtz Center for Information Security +1 more

Pixel-wise reconstruction attack removes AI-image watermarks without querying detectors or knowing the watermarking scheme

Output Integrity Attack visiongenerative

PDF

attack arXiv Mar 1, 2026 · 11w ago

Turning Black Box into White Box: Dataset Distillation Leaks

Huajie Chen, Tianqing Zhu, Yuchen Zhong et al. · City University of Macau · CISPA Helmholtz Center for Information Security +2 more

Reveals that dataset distillation leaks training data via three-stage attack: architecture inference, membership inference, and model inversion

Model Inversion Attack Membership Inference Attack vision

PDF

attack arXiv Dec 18, 2025 · Dec 2025

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

Lulu Xue, Shengshan Hu, Linqiang Qian et al. · Huazhong University of Science and Technology · Tsinghua University +4 more

Novel black-box MIA exploits dual-model access after unlearning to infer membership of retained data via likelihood ratio inference

Membership Inference Attack vision

2 citations PDF

benchmark arXiv Dec 16, 2025 · Dec 2025

Black-Box Auditing of Quantum Model: Lifted Differential Privacy with Quantum Canaries

Baobao Song, Shiva Raj Pokhrel, Athanasios V. Vasilakos et al. · University of Technology Sydney · Deakin University +2 more

Black-box canary framework audits quantum ML models for memorization, empirically lower-bounding privacy leakage via quantum differential privacy

Membership Inference Attack

PDF

defense TDSC Nov 25, 2025 · Nov 2025

Frequency Bias Matters: Diving into Robust and Generalized Deep Image Forgery Detection

Chi Liu, Tianqing Zhu, Wanlei Zhou et al. · City University of Macau · Chinese Academy of Sciences

Frequency alignment method that both evades 12 deepfake detectors as a black-box attack and improves detector robustness and generalization

Output Integrity Attack Input Manipulation Attack vision

PDF

attack arXiv Nov 19, 2025 · Nov 2025

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu et al. · City University of Macau · University of Vienna +3 more

Novel LLM jailbreak using conceptual morphology triggers to shift ideological orientation in outputs without triggering safety filters

Prompt Injection nlp

PDF

defense arXiv Nov 16, 2025 · Nov 2025

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

Jialiang Shen, Jiyang Zheng, Yunqi Xue et al. · The University of Sydney · Shanghai Jiao Tong University +3 more

Proposes blur-robust AI-generated image detector via DINO-based teacher-student knowledge distillation for real-world motion degradation

Output Integrity Attack vision

1 citations PDF Code

defense arXiv Nov 12, 2025 · Nov 2025

GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks

Yanli Li, Yanan Zhou, Zhongliang Guo et al. · Nantong University · The University of Sydney +3 more

Introduces dual-facet Byzantine FL attack degrading accuracy and fairness simultaneously, defended by trust-score aggregation in GuardFed

Data Poisoning Attack federated-learning

PDF

benchmark arXiv Oct 21, 2025 · Oct 2025

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability

Zijie Xu, Minfeng Qi, Shiqing Wu et al. · Minzu University of China · City University of Macau +1 more

Empirically validates that higher inter-agent trust in LLM multi-agent systems increases sensitive data over-exposure and authorization boundary violations

Excessive Agency Sensitive Information Disclosure nlp

2 citations PDF

attack TIFS Oct 9, 2025 · Oct 2025

DarkHash: A Data-Free Backdoor Attack Against Deep Hashing

Ziqi Zhou, Menghao Deng, Yufei Song et al. · Huazhong University of Science and Technology · City University of Macau +1 more

Data-free backdoor attack on deep hashing models using surrogate datasets and topological alignment loss to manipulate image retrieval results

Model Poisoning vision

7 citations PDF

defense arXiv Sep 21, 2025 · Sep 2025

MARS: A Malignity-Aware Backdoor Defense in Federated Learning

Wei Wan, Yuxuan Ning, Zhicong Huang et al. · City University of Macau · Australian National University +4 more

Defends federated learning against backdoor attacks using neuron-level backdoor energy and Wasserstein clustering to detect malicious model updates

Model Poisoning federated-learningvision

5 citations PDF

defense arXiv Sep 18, 2025 · Sep 2025

Causal Fingerprints of AI Generative Models

Hui Xu, Chi Liu, Congcong Zhu et al. · City University of Macau · Qilu University of Technology +1 more

Proposes causal fingerprinting framework to attribute AI-generated images to source GANs or diffusion models via disentangled model traces