Latest papers

28 papers
defense arXiv Apr 1, 2026 · 5d ago

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization

Xinyu Sun, Wanwei Liu, Haoang Chi et al. · National University of Defense Technology · Nanjing University +1 more

Interpretable DNN repair using Shapley-guided fault localization and derivative-free optimization for backdoor removal, adversarial defense, and fairness

Input Manipulation Attack Model Poisoning vision
PDF
defense arXiv Mar 25, 2026 · 12d ago

Enhancing and Reporting Robustness Boundary of Neural Code Models for Intelligent Code Understanding

Tingxu Han, Wei Song, Weisong Sun et al. · Nanjing University · University of New South Wales +2 more

Black-box certified defense for code models using randomized smoothing to reduce adversarial attack success from 42% to 9.74%

Input Manipulation Attack nlp
PDF
defense arXiv Mar 2, 2026 · 5w ago

Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)

Yu Lin, Qizhi Zhang, Wenqiang Ruan et al. · ByteDance · Nanjing University

Defends user input privacy in cloud LLM inference by obfuscating activations to resist internal state inversion attacks

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Feb 18, 2026 · 6w ago

SRFed: Mitigating Poisoning Attacks in Privacy-Preserving Federated Learning with Heterogeneous Data

Yiwen Lu · Nanjing University

Defends federated learning against Byzantine poisoning and server-side gradient inference attacks using functional encryption and clustering-based aggregation

Data Poisoning Attack Model Inversion Attack federated-learning
PDF
defense arXiv Feb 12, 2026 · 7w ago

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Dong Yan, Jian Liang, Ran He et al. · University of Chinese Academy of Sciences · Chinese Academy of Sciences +1 more

Defends against LLM attribute inference attacks using fine-grained anonymization and adversarial suffix optimization to induce model rejection

Sensitive Information Disclosure nlp
1 citations PDF Code
attack arXiv Feb 6, 2026 · 8w ago

Confundo: Learning to Generate Robust Poison for Practical RAG Systems

Haoyang Hu, Zhejun Jiang, Yueming Lyu et al. · The University of Hong Kong · Nanjing University +1 more

Fine-tunes an LLM as a poison generator to inject robust, chunking-aware malicious content into RAG knowledge bases

Data Poisoning Attack Prompt Injection nlp
PDF
defense arXiv Feb 1, 2026 · 9w ago

Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts

Songping Wang, Qinglong Liu, Yueming Lyu et al. · Nanjing University · Ltd. +1 more

Proposes component-level adversarial attacks and defenses targeting routers and experts in video MoE models

Input Manipulation Attack vision
1 citations PDF
defense arXiv Feb 1, 2026 · 9w ago

Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons

Xianhui Zhang, Chengyu Xie, Linxia Zhu et al. · Nanjing University of Science and Technology · National University of Singapore +2 more

Identifies sparse cross-lingual safety neurons in LLMs and proposes targeted fine-tuning to close multilingual jailbreak safety gaps

Prompt Injection nlp
PDF Code
defense arXiv Jan 29, 2026 · 9w ago

AtPatch: Debugging Transformers via Hot-Fixing Over-Attention

Shihao Weng, Yang Feng, Jincheng Li et al. · Nanjing University · Singapore Management University

Inference-time defense that neutralizes backdoor triggers in transformers by detecting and redistributing anomalous attention maps without modifying weights

Model Poisoning visionnlp
PDF
defense arXiv Jan 29, 2026 · 9w ago

TraceRouter: Robust Safety for Large Foundation Models via Path-Level Intervention

Chuancheng Shi, Shangze Li, Wenjun Lu et al. · The University of Sydney · Nanjing University of Science and Technology +2 more

Defends LLMs, diffusion models, and MLLMs from jailbreaks by tracing and severing harmful semantic circuits via sparse autoencoders and causal path analysis

Input Manipulation Attack Prompt Injection nlpvisionmultimodalgenerative
PDF
defense arXiv Jan 23, 2026 · 10w ago

SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment

Xianya Fang, Xianying Luo, Yadong Wang et al. · Nanjing University of Aeronautics and Astronautics · Tsinghua University +3 more

Adaptive three-stage LLM defense routes inputs by risk level to counter jailbreaks and prefilling attacks without sacrificing utility

Prompt Injection nlp
PDF
attack arXiv Jan 18, 2026 · 11w ago

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Yi Qian, Kunwei Qian, Xingbang He et al. · Nanjing University · Ltd +1 more

Attacks VLM-powered Android GUI agents by hijacking UI state between observation and action, achieving 100% success with zero permissions

Prompt Injection Excessive Agency multimodal
PDF
attack arXiv Dec 24, 2025 · Dec 2025

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Haoyang Li, Mingjin Li, Jinxin Zuo et al. · Beijing University of Posts and Telecommunications · Chinese Academy of Sciences +3 more

Adversarial code obfuscation framework that exploits CoT reasoning chain weaknesses to evade LLM-based vulnerability detectors

Input Manipulation Attack Prompt Injection nlp
PDF Code
benchmark TrustCom Dec 17, 2025 · Dec 2025

Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference

Chenxiang Zhang, Tongxi Qu, Zhong Li et al. · University of Luxembourg · Nanjing University

Evaluates how post-training quantization affects membership inference vulnerability, finding 1.58-bit models leak an order of magnitude less than full-precision

Membership Inference Attack vision
PDF
attack arXiv Dec 10, 2025 · Dec 2025

Reference Recommendation based Membership Inference Attack against Hybrid-based Recommender Systems

Xiaoxiao Chi, Xuyun Zhang, Yan Wang et al. · Macquarie University · The University of Newcastle +1 more

Novel metric-based membership inference attack against hybrid recommender systems using reference recommendations to infer user training membership

Membership Inference Attack tabular
PDF
attack arXiv Dec 7, 2025 · Dec 2025

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang, Rufan Qian, Yueming Lyu et al. · Nanjing University · Meituan +1 more

Self-evolving RL+LLM jailbreak framework for Image-to-Video models outperforms baselines by up to 79% via coordinated text-image attacks

Prompt Injection multimodalgenerativevisionnlp
2 citations PDF
defense arXiv Dec 4, 2025 · Dec 2025

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World

Jikang Cheng, Renye Yan, Zhiyuan Yan et al. · Peking University · Nanjing University +3 more

Proposes DevDet framework that amplifies real/fake differences over domain signals for robust multi-domain deepfake detection

Output Integrity Attack vision
PDF
attack arXiv Dec 1, 2025 · Dec 2025

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Xinyun Zhou, Xinfeng Li, Yinan Peng et al. · Zhejiang University · Hengxin Technology +5 more

Emoticon injection into RAG queries poisons retrieval with ~100% success, exposing a critical LLM-integrated system vulnerability

Input Manipulation Attack Prompt Injection nlp
1 citations PDF
attack arXiv Nov 27, 2025 · Nov 2025

Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression

Tianyu Zhang, Zihang Xi, Jingyu Hua et al. · Nanjing University

Builds a lightweight proxy that predicts jailbreak success rates, enabling black-box-to-quasi-white-box attack optimization of LLMs

Prompt Injection nlp
PDF
defense arXiv Nov 24, 2025 · Nov 2025

ConceptGuard: Proactive Safety in Text-and-Image-to-Video Generation through Multimodal Risk Detection

Ruize Ma, Minghong Cai, Yilei Jiang et al. · The Chinese University of Hong Kong · Nanjing University +2 more

Proactive multimodal safety guardrail for video generation that detects unsafe text+image prompts and suppresses harmful concept generation

Prompt Injection multimodalgenerativevision
1 citations 1 influentialPDF Code
Loading more papers…