Latest papers

4 papers
defense arXiv Jan 12, 2026 · 12w ago

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Mingxiang Tao, Yu Tian, Wenxuan Tu et al. · Hainan University · Tsinghua University +1 more

Probe-based defense framework classifies LoRA weight updates to detect and suppress malicious clients in federated LLM fine-tuning

Model Poisoning Data Poisoning Attack Training Data Poisoning federated-learningnlp
PDF Code
survey arXiv Oct 9, 2025 · Oct 2025

Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs

Man Hu, Xinyi Wu, Zuofeng Suo et al. · Beijing Electronic Science and Technology Institute · Nanyang Technological University +1 more

First survey on backdoor attacks targeting LLM reasoning processes, proposing a three-type taxonomy of associative, passive, and active backdoors

Model Poisoning nlp
PDF
defense arXiv Aug 24, 2025 · Aug 2025

Risk Assessment and Security Analysis of Large Language Models

Xiaoyan Zhang, Dongyang Lyu, Xiaoqi Li · Hainan University

Hierarchical LLM defense framework combining BERT-CRF input filtering, adversarial training, and neural output watermarking to detect jailbreaks

Output Integrity Attack Prompt Injection nlp
PDF
survey arXiv Aug 13, 2025 · Aug 2025

Security Analysis of ChatGPT: Threats and Privacy Risks

Yushan Xiang, Zhongwen Li, Xiaoqi Li · Hainan University

Surveys ChatGPT security threats and privacy risks including prompt injection, training data leakage, and model stealing

Model Theft Model Inversion Attack Prompt Injection Sensitive Information Disclosure nlp
PDF