Wanlei Zhou

attack arXiv Apr 10, 2026 · 5w ago

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Wenhan Chang, Tianqing Zhu, Ping Xiong et al. · Zhongnan University of Economics and Law · City University of Macau

Backdoor attack embedding triggers in lightweight adapters that hijack LLM reasoning chains to display malicious thought processes

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp

PDF Code

defense arXiv Apr 23, 2026 · 28d ago

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen et al. · City University of Macau · University of Technology Sydney

Detects poisoned training samples via early-epoch clustering and neutralizes backdoors by relabeling them to a virtual class

Model Poisoning vision

PDF

attack arXiv Mar 17, 2026 · 9w ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision

PDF

defense arXiv Mar 4, 2026 · 11w ago

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

Yizhe Xie, Congcong Zhu, Xinyue Zhang et al. · City University of Macau · Minzu University of China

Models and defends against injected error-seed cascades in LLM multi-agent systems via genealogy-graph message governance

Prompt Injection Excessive Agency nlp

PDF Code

attack arXiv Mar 5, 2026 · 11w ago

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu et al. · City University of Macau · Jinan University +1 more

Poisons distilled synthetic datasets to embed hidden hijacking tasks in models fine-tuned via transfer learning

Data Poisoning Attack Transfer Learning Attack vision

PDF

attack arXiv Mar 1, 2026 · 11w ago

Turning Black Box into White Box: Dataset Distillation Leaks

Huajie Chen, Tianqing Zhu, Yuchen Zhong et al. · City University of Macau · CISPA Helmholtz Center for Information Security +2 more

Reveals that dataset distillation leaks training data via three-stage attack: architecture inference, membership inference, and model inversion

Model Inversion Attack Membership Inference Attack vision

PDF

attack arXiv Mar 1, 2026 · 11w ago

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

Huajie Chen, Tianqing Zhu, Hailin Yang et al. · City University of Macau · CISPA Helmholtz Center for Information Security +1 more

Pixel-wise reconstruction attack removes AI-image watermarks without querying detectors or knowing the watermarking scheme

Output Integrity Attack visiongenerative

PDF