Shouling Ji

defense arXiv Mar 4, 2026 · 5w ago

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

Yangyang Wei, Yijie Xu, Zhenyuan Li et al. · Zhejiang University · HOFSTRA University

Defends multi-agent LLM systems against indirect prompt injection by reconstructing cross-agent semantic flows for behavioral anomaly detection

Prompt Injection Excessive Agency nlp

PDF Code

defense arXiv Feb 6, 2026 · 8w ago

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · National University of Singapore +1 more

Proactive fine-tuning defense traps gradient-based jailbreak suffixes or fingerprints them, cutting LLM attack success below 0.01%

Input Manipulation Attack Prompt Injection nlp

PDF

tool arXiv Sep 4, 2025 · Sep 2025

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

Chuhan Zhang, Ye Zhang, Bowen Shi et al. · Zhejiang University

Builds neuron-level analysis tool to dissect LLM jailbreak mechanisms via layer-wise probing and critical neuron identification

Prompt Injection nlp

PDF

defense arXiv Aug 30, 2025 · Aug 2025

FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks

Yuwen Pu, Zhou Feng, Chunyi Zhou et al. · Chongqing University · Zhejiang University

Adds frequency-domain adversarial perturbations to audio in a black-box setting to prevent voice cloning by VC/TTS models

Input Manipulation Attack audio

PDF

benchmark arXiv Mar 21, 2026 · 18d ago

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu et al. · Zhejiang University · Chongqing University +1 more

Measurement study showing FL poisoning attacks are less effective in practice than research suggests due to heterogeneity and stability constraints

Data Poisoning Attack visionnlptabularfederated-learning

PDF Code

Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly understood. In particular, a large body of poisoning research is evaluated under idealized assumptions about attacker participation, client homogeneity, and success metrics, which can substantially distort how security risks are perceived in deployed FL systems. This paper revisits FL security from a measurement perspective. We systematize three major sources of mismatch between research and practice: unrealistic poisoning threat models, the omission of hybrid heterogeneity, and incomplete metrics that overemphasize peak attack success while ignoring stability and utility cost. To study these gaps, we build TFLlib, a uniform evaluation framework that supports image, text, and tabular FL tasks and re-implements representative poisoning attacks under practical settings. Our empirical study shows that idealized evaluation often overstates security risk. Under practical settings, attack performance becomes markedly more dataset-dependent and unstable, and several attacks that appear consistently strong in idealized FL lose effectiveness or incur clear benign-task degradation once practical constraints are enforced. These findings further show that final-round attack success alone is insufficient for security assessment; practical measurement must jointly consider effectiveness, temporal stability, and collateral utility loss. Overall, this work argues that many conclusions in the FL poisoning literature are not directly transferable to real deployments. By tightening the threat model and using measurement protocols aligned with practice, we provide a more realistic view of the security risks faced by contemporary FL systems and distill concrete guidance for future FL security evaluation. Our code is available at https://github.com/xaddwell/TFLlib

federated cnn transformer Zhejiang University · Chongqing University · Southeast University

PDF arXiv Code

attack arXiv Sep 19, 2025 · Sep 2025

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE

Xinpeng Liu, Junming Liu, Peiyu Liu et al. · Zhejiang University · EPFL

Hijacks LLM coding agents by embedding malicious payloads in config files, achieving persistent stealthy execution across nine AI-IDEs

Prompt Injection Insecure Plugin Design nlp

PDF

defense arXiv Aug 21, 2025 · Aug 2025

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

Hengyu An, Jinghuai Zhang, Tianyu Du et al. · Zhejiang University · University of California +1 more

Defends LLM agents against indirect prompt injection by constraining tool calls via a planned dependency graph

Prompt Injection Insecure Plugin Design nlp

PDF Code

defense arXiv Aug 21, 2025 · Aug 2025

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

Naen Xu, Jinghuai Zhang, Changjiang Li et al. · Zhejiang University · University of California +2 more

Training-free concept erasure framework prevents T2V diffusion models from generating harmful, private, or copyrighted content despite adversarial prompts

Output Integrity Attack generativevision

PDF

defense arXiv Jan 10, 2025 · Jan 2025

Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data

Jiale Zhang, Bosen Rao, Chengcheng Zhu et al. · Yangzhou University · Zhejiang University +1 more

Defends GNNs against backdoor attacks via attention-transfer distillation using only 3% clean data to drop ASR below 5%

Model Poisoning graph

PDF

Papers in Database (9)