Latest papers

18 papers
defense arXiv Apr 1, 2026 · 5d ago

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi et al. · National University of Defense Technology · Nanjing University +1 more

Interpretable DNN repair using Shapley-guided fault localization and derivative-free optimization for backdoor removal, adversarial defense, and fairness

Input Manipulation Attack Model Poisoning vision
PDF
defense arXiv Mar 16, 2026 · 21d ago

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian et al. · National University of Defense Technology

LLM-powered trigger generator using reinforcement learning to detect and remove backdoors in NLP models via adversarial training

Model Poisoning nlp
PDF Code
attack arXiv Mar 4, 2026 · 4w ago

LEA: Label Enumeration Attack in Vertical Federated Learning

Wenhao Jiang, Shaojing Fu, Yuchuan Luo et al. · National University of Defense Technology

Infers private labels in vertical federated learning by enumerating label permutations and comparing gradient cosine similarity, without auxiliary data

Model Inversion Attack federated-learning
PDF
defense arXiv Feb 6, 2026 · 8w ago

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · National University of Singapore +1 more

Proactive fine-tuning defense traps gradient-based jailbreak suffixes or fingerprints them, cutting LLM attack success below 0.01%

Input Manipulation Attack Prompt Injection nlp
PDF
attack arXiv Jan 24, 2026 · 10w ago

Reconstructing Training Data from Adapter-based Federated Large Language Models

Silong Chen, Yuchuan Luo, Guilin Deng et al. · National University of Defense Technology · City University of Hong Kong

Gradient inversion attack reconstructs training text from LoRA adapter gradients in federated LLMs achieving ROUGE-1/2 over 99

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF Code
defense arXiv Jan 20, 2026 · 10w ago

MirageNet:A Secure, Efficient, and Scalable On-Device Model Protection in Heterogeneous TEE and GPU System

Huadi Zheng, Li Cheng, Yan Ding · National University of Defense Technology

Defends edge-deployed DNN model IP from theft via TEE-GPU obfuscation, cutting overhead 16% versus GroupCover

Model Theft vision
PDF
defense arXiv Jan 6, 2026 · Jan 2026

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Xi Wang, Songlei Jian, Shasha Li et al. · National University of Defense Technology

Defends LLMs against jailbreaks by unlearning dynamic information paths that reassemble harmful outputs, not just isolated parameters

Prompt Injection nlp
PDF
attack arXiv Dec 25, 2025 · Dec 2025

Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation

Tian Li, Bo Lin, Shangwen Wang et al. · National University of Defense Technology

Backdoors RACG retrievers to inject vulnerable code into LLM context, achieving 40%+ vulnerable code generation while bypassing defenses

Model Poisoning Prompt Injection nlpgenerative
PDF
attack arXiv Dec 16, 2025 · Dec 2025

Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space

Xingfu Zhou, Pengfei Wang · National University of Defense Technology

Poisons LLM agent reasoning by style-transferring retrieved docs into pathological tones, bypassing content filters without altering facts

Prompt Injection nlp
2 citations PDF
attack TDSC Dec 16, 2025 · Dec 2025

Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix

Wei Tao, Sheng Long, Xin Liu et al. · National University of Defense Technology · Academy of Military Science +3 more

AdaMI: momentum-based adaptive matrix attack that provably improves adversarial transferability over PGD and MI-FGSM across networks

Input Manipulation Attack vision
PDF
defense medRxiv Dec 5, 2025 · Dec 2025

The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs

Jiale Zhao, Xing Mou, Jinlin Wu et al. · National University of Defense Technology · Chinese Academy of Sciences +3 more

Defends Medical MLLMs against cross-modality jailbreaks by grafting safety knowledge from base models during fine-tuning via parameter-space intervention

Transfer Learning Attack Prompt Injection multimodalvisionnlp
PDF
benchmark arXiv Nov 22, 2025 · Nov 2025

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

Yunyi Zhang, Shibo Cui, Baojun Liu et al. · Tsinghua University · National University of Defense Technology +1 more

Discovers LLM apps routinely exceed intended capability boundaries, with 17 apps performing malicious tasks without any adversarial prompting

Excessive Agency Prompt Injection nlp
PDF
attack Chinese Conference on Pattern ... Nov 10, 2025 · Nov 2025

FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection

Yulin Chen, Zeyuan Wang, Tianyuan Yu et al. · National University of Defense Technology

Gradient-based adversarial framework fools CLIP image-quality metrics, then detects tampered images via grayscale color-channel sensitivity

Input Manipulation Attack visionmultimodal
PDF
defense arXiv Oct 9, 2025 · Oct 2025

Provably Robust Adaptation for Language-Empowered Foundation Models

Yuni Lai, Xiaoyu Xue, Linghui Shen et al. · The Hong Kong Polytechnic University · National University of Defense Technology +2 more

Certifiably robust few-shot classifier for CLIP/GraphCLIP using trimmed-mean prototypes and randomized smoothing against support-set poisoning

Data Poisoning Attack visiongraphmultimodal
1 citations PDF
attack arXiv Oct 1, 2025 · Oct 2025

Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack

Nanxiang Jiang, Zhaoxin Fan, Enhan Kang et al. · Beihang University · University of Science and Technology of China +3 more

Attacks concept erasure safety in Flux T2I models by exploiting attention localization, reactivating suppressed content via a 3.57 MB plug-and-play adapter

Input Manipulation Attack visiongenerative
1 citations PDF Code
attack arXiv Sep 25, 2025 · Sep 2025

RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks

Hanbo Huang, Yiran Zhang, Hao Zheng et al. · Shanghai Jiao Tong University · National University of Defense Technology

RL-based attack removes LLM text watermarks with 98.5% success using 100 training samples, defeating 10 watermarking schemes

Output Integrity Attack nlp
PDF
defense arXiv Aug 27, 2025 · Aug 2025

Learning from Peers: Collaborative Ensemble Adversarial Training

Li Dengjin, Guo Yanming, Xie Yuxiang et al. · National University of Defense Technology

Defends against adversarial examples via collaborative ensemble training that reweights samples by cross-model prediction disparity

Input Manipulation Attack vision
PDF
attack arXiv Aug 25, 2025 · Aug 2025

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience

Xi Wang, Songlei Jian, Shasha Li et al. · National University of Defense Technology · Inner Mongolia University

Automated LLM jailbreak framework using structured past attack experiences to boost success rate 17% over SOTA black-box methods

Prompt Injection nlp
PDF Code