Shouling Ji

Papers in Database (9)

defense arXiv Mar 4, 2026 · 5w ago

Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection

Yangyang Wei, Yijie Xu, Zhenyuan Li et al. · Zhejiang University · HOFSTRA University

Defends multi-agent LLM systems against indirect prompt injection by reconstructing cross-agent semantic flows for behavioral anomaly detection

Prompt Injection Excessive Agency nlp
PDF Code
defense arXiv Feb 6, 2026 · 8w ago

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking

Mengyao Du, Han Fang, Haokai Ma et al. · National University of Defense Technology · National University of Singapore +1 more

Proactive fine-tuning defense traps gradient-based jailbreak suffixes or fingerprints them, cutting LLM attack success below 0.01%

Input Manipulation Attack Prompt Injection nlp
PDF
tool arXiv Sep 4, 2025 · Sep 2025

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

Chuhan Zhang, Ye Zhang, Bowen Shi et al. · Zhejiang University

Builds neuron-level analysis tool to dissect LLM jailbreak mechanisms via layer-wise probing and critical neuron identification

Prompt Injection nlp
PDF
defense arXiv Aug 30, 2025 · Aug 2025

FreeTalk:A plug-and-play and black-box defense against speech synthesis attacks

Yuwen Pu, Zhou Feng, Chunyi Zhou et al. · Chongqing University · Zhejiang University

Adds frequency-domain adversarial perturbations to audio in a black-box setting to prevent voice cloning by VC/TTS models

Input Manipulation Attack audio
PDF
benchmark arXiv Mar 21, 2026 · 18d ago

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu et al. · Zhejiang University · Chongqing University +1 more

Measurement study showing FL poisoning attacks are less effective in practice than research suggests due to heterogeneity and stability constraints

Data Poisoning Attack visionnlptabularfederated-learning
PDF Code
attack arXiv Sep 19, 2025 · Sep 2025

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE

Xinpeng Liu, Junming Liu, Peiyu Liu et al. · Zhejiang University · EPFL

Hijacks LLM coding agents by embedding malicious payloads in config files, achieving persistent stealthy execution across nine AI-IDEs

Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Aug 21, 2025 · Aug 2025

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

Hengyu An, Jinghuai Zhang, Tianyu Du et al. · Zhejiang University · University of California +1 more

Defends LLM agents against indirect prompt injection by constraining tool calls via a planned dependency graph

Prompt Injection Insecure Plugin Design nlp
PDF Code
defense arXiv Aug 21, 2025 · Aug 2025

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

Naen Xu, Jinghuai Zhang, Changjiang Li et al. · Zhejiang University · University of California +2 more

Training-free concept erasure framework prevents T2V diffusion models from generating harmful, private, or copyrighted content despite adversarial prompts

Output Integrity Attack generativevision
PDF
defense arXiv Jan 10, 2025 · Jan 2025

Fine-tuning is Not Fine: Mitigating Backdoor Attacks in GNNs with Limited Clean Data

Jiale Zhang, Bosen Rao, Chengcheng Zhu et al. · Yangzhou University · Zhejiang University +1 more

Defends GNNs against backdoor attacks via attention-transfer distillation using only 3% clean data to drop ASR below 5%

Model Poisoning graph
PDF