Latest papers

11 papers
defense arXiv Feb 26, 2026 · 5w ago

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng, Tao Huang, Chen Hou et al. · Renmin University of China · Minjiang University

Defends intermediate representations against layer-wise membership inference by adaptively allocating DP-SGD noise proportional to per-layer MIA risk

Membership Inference Attack nlp
PDF
attack arXiv Feb 26, 2026 · 5w ago

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia et al. · Nanyang Technological University · BraneMatrix AI +7 more

Bio-inspired optimization generates classical Chinese jailbreak prompts that defeat modern-language safety guardrails in black-box LLMs

Prompt Injection nlp
PDF
defense arXiv Feb 10, 2026 · 7w ago

Omni-Safety under Cross-Modality Conflict: Vulnerabilities, Dynamics Mechanisms and Efficient Alignment

Kun Wang, Zherui Li, Zhenhong Zhou et al. · Nanyang Technological University · Beijing University of Posts and Telecommunications +4 more

Exposes cross-modal jailbreak vulnerabilities in omni-modal LLMs and defends via SVD-guided refusal vector amplification with lightweight adapters

Prompt Injection multimodalnlp
PDF Code
attack arXiv Dec 2, 2025 · Dec 2025

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li et al. · Shanghai Artificial Intelligence Laboratory · Xi’an Jiaotong University +1 more

Jailbreaks multimodal LLMs by embedding harmful queries in crafted visual contexts via a multi-agent image generation system

Prompt Injection visionmultimodalnlp
PDF
defense arXiv Dec 2, 2025 · Dec 2025

Adaptive Decentralized Federated Learning for Robust Optimization

Shuyuan Wu, Feifei Wang, Yuan Gao et al. · Shanghai University of Finance and Economics · Renmin University of China +2 more

Defends decentralized federated learning against Byzantine and data-poisoned clients via adaptive per-client learning rate adjustment

Data Poisoning Attack federated-learning
PDF
attack arXiv Nov 23, 2025 · Nov 2025

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems

Xiaoqing Wang, Keman Huang, Bin Liang et al. · Renmin University of China · Ant Group

Attacks LLM multi-agent software dev systems via prompt injection and compromised agents to produce hidden malware

Prompt Injection Excessive Agency nlp
PDF Code
attack arXiv Nov 13, 2025 · Nov 2025

Enhanced Privacy Leakage from Noise-Perturbed Gradients via Gradient-Guided Conditional Diffusion Models

Jiayang Meng, Tao Huang, Hong Chen et al. · arXiv · Renmin University of China +1 more

Diffusion model-guided gradient inversion attack that reconstructs private images from noise-perturbed FL gradients, bypassing a common defense

Model Inversion Attack visionfederated-learning
PDF
defense arXiv Nov 11, 2025 · Nov 2025

Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

Chenhao Dang, Jing Ma · University of Electronic Science and Technology of China · China Electronics Technology Group Corporation +1 more

Defends text classifiers against adversarial word substitutions by projecting perturbed embeddings back onto the clean data manifold via geodesic purification

Input Manipulation Attack nlp
PDF Code
benchmark arXiv Oct 20, 2025 · Oct 2025

Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses

Runlin Lei, Lu Yi, Mingguo He et al. · Renmin University of China · National University of Singapore +1 more

Benchmarks GNN and LLM robustness on text-attributed graphs under text, structure, and hybrid adversarial attacks, revealing trade-offs and proposing SFT-auto defense

Input Manipulation Attack Data Poisoning Attack graphnlp
PDF Code
attack arXiv Oct 11, 2025 · Oct 2025

The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities

Zixuan Qin, Qingchen Yu, Kunlin Lyu et al. · Renmin University of China · Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing +1 more

Identifies as few as 3 critical neurons in 72B LLMs whose targeted corruption causes complete capability collapse

Model Poisoning nlp
2 citations PDF Code
defense arXiv Aug 8, 2025 · Aug 2025

Learning to Detect Unseen Jailbreak Attacks in Large Vision-Language Models

Shuang Liang, Zhihao Xu, Jiaqi Weng et al. · Renmin University of China · Alibaba Group

Defends VLMs against unseen jailbreaks by learning safety representations from internal activations without requiring attack data

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF Code