Latest papers

12 papers
defense arXiv Apr 19, 2026 · 4w ago

Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Qihao Shen, Jiaxing Xuan, Zhenguang Liu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +4 more

Triple-branch deepfake detector using spatial and frequency features with mutual information losses for robust cross-dataset generalization

Output Integrity Attack visionmultimodal
PDF Code
attack arXiv Apr 16, 2026 · 5w ago

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +2 more

Adversarial audio injection attack hijacking audio-language models via imperceptible audio perturbations that generalize across contexts

Input Manipulation Attack Prompt Injection audiomultimodalnlp
PDF
defense arXiv Apr 9, 2026 · 6w ago

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Identifies safety-critical LLM parameters via gradient analysis, enabling targeted safety tuning and preservation during fine-tuning

Prompt Injection nlp
PDF Code
defense arXiv Feb 5, 2026 · Feb 2026

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Qing Wen, Haohao Li, Zhongjie Ba et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Hypergraph-based audio deepfake detector modeling high-order feature interactions for superior cross-domain generalization

Output Integrity Attack audio
PDF
defense arXiv Feb 4, 2026 · Feb 2026

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Inference-time backdoor defense for LLMs suppresses trojan triggers in Verilog code generation via semantic consensus decoding

Model Poisoning nlp
PDF
attack arXiv Oct 3, 2025 · Oct 2025

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based untargeted jailbreak attack maximizes LLM unsafety probability without fixed response targets, achieving 80% ASR in 100 iterations

Input Manipulation Attack Prompt Injection nlp
2 citations PDF Code
attack arXiv Oct 3, 2025 · Oct 2025

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Proposes SECRET, an adaptive jailbreak-plus-retrieval-trigger attack that extracts RAG knowledge base contents verbatim from leading commercial LLMs

Sensitive Information Disclosure Prompt Injection nlp
1 citations PDF
attack arXiv Oct 2, 2025 · Oct 2025

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based jailbreak attack using adaptive harmful-response sampling as optimization targets, achieving 87% ASR on safety-aligned LLMs in 200 iterations

Input Manipulation Attack Prompt Injection nlp
2 citations PDF Code
defense arXiv Sep 17, 2025 · Sep 2025

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

Chao Shuai, Gaojian Wang, Kun Pan et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Proposes morphological multi-scale fusion for deepfake detection that jointly localizes manipulated regions with noise suppression

Output Integrity Attack vision
PDF
benchmark arXiv Aug 27, 2025 · Aug 2025

SoK: Large Language Model Copyright Auditing via Fingerprinting

Shuo Shao, Yiming Li, Yu He et al. · Zhejiang University · Nanyang Technological University +3 more

Surveys LLM fingerprinting for copyright auditing and benchmarks 13 post-development robustness techniques across 149 model instances

Model Theft Model Theft nlp
PDF Code
attack arXiv Aug 18, 2025 · Aug 2025

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Weiwei Qi, Shuo Shao, Wei Gu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Markov-chain jailbreak framework combines diverse disguise strategies adaptively, achieving 90%+ ASR on GPT-4o in under 15 queries

Prompt Injection nlp
PDF
defense arXiv Aug 13, 2025 · Aug 2025

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo, Shuo Shao, Su Zhang et al. · Zhejiang University · Huawei +1 more

Adversaries reconstruct private user prompts from LLM KV-cache via inversion, collision, and injection attacks; KV-Cloak defends with reversible matrix obfuscation

Model Inversion Attack Sensitive Information Disclosure nlp
PDF