Cheng Hong

defense arXiv Sep 21, 2025 · Sep 2025

Wei Wan, Yuxuan Ning, Zhicong Huang et al. · City University of Macau · Australian National University +4 more

Defends federated learning against backdoor attacks using neuron-level backdoor energy and Wasserstein clustering to detect malicious model updates

Model Poisoning federated-learningvision

5 citations PDF

benchmark arXiv Sep 29, 2025 · Sep 2025

Qingjie Zhang, Haoting Qian, Zhicong Huang et al. · Tsinghua University · Ant Group

Reveals that LLM unlearning methods fail to truly erase knowledge, which adversaries can recover via prompt keyword emphasis

Sensitive Information Disclosure nlp

3 citations PDF Code

defense arXiv Nov 13, 2025 · Nov 2025

Jialin Wu, Kecen Li, Zhicong Huang et al. · Ant Group · Nanyang Technological University

Defends LLM safety alignment from fine-tuning degradation via NTK-based safety vector distillation and interference-aware merging

Transfer Learning Attack Prompt Injection nlp

1 citations PDF

defense arXiv Sep 29, 2025 · Sep 2025

Yuepeng Hu, Zhengyuan Jiang, Mengyuan Li et al. · Duke University · Ant Group

Fingerprints LLMs for provenance detection by optimizing prompt-injection-based probes that survive post-training and quantization

Model Theft Model Theft nlp

1 citations 1 influentialPDF

defense arXiv Jan 29, 2026 · 9w ago

Xiaoyu Xu, Minxin Du, Kun Fang et al. · The Hong Kong Polytechnic University · Ant Group

Defends continual LLM unlearning of PII, copyright, and harmful content against adversarial recovery via relearning and quantization attacks

Sensitive Information Disclosure nlp

Papers in Database (5)