Latest papers

9 papers
defense arXiv Feb 5, 2026 · 8w ago

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Qing Wen, Haohao Li, Zhongjie Ba et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Hypergraph-based audio deepfake detector modeling high-order feature interactions for superior cross-domain generalization

Output Integrity Attack audio
PDF
defense arXiv Feb 4, 2026 · 8w ago

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Inference-time backdoor defense for LLMs suppresses trojan triggers in Verilog code generation via semantic consensus decoding

Model Poisoning nlp
PDF
attack arXiv Oct 3, 2025 · Oct 2025

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based untargeted jailbreak attack maximizes LLM unsafety probability without fixed response targets, achieving 80% ASR in 100 iterations

Input Manipulation Attack Prompt Injection nlp
2 citations PDF Code
attack arXiv Oct 3, 2025 · Oct 2025

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Proposes SECRET, an adaptive jailbreak-plus-retrieval-trigger attack that extracts RAG knowledge base contents verbatim from leading commercial LLMs

Sensitive Information Disclosure Prompt Injection nlp
1 citations PDF
attack arXiv Oct 2, 2025 · Oct 2025

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based jailbreak attack using adaptive harmful-response sampling as optimization targets, achieving 87% ASR on safety-aligned LLMs in 200 iterations

Input Manipulation Attack Prompt Injection nlp
2 citations PDF Code
defense arXiv Sep 17, 2025 · Sep 2025

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

Chao Shuai, Gaojian Wang, Kun Pan et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Proposes morphological multi-scale fusion for deepfake detection that jointly localizes manipulated regions with noise suppression

Output Integrity Attack vision
PDF
benchmark arXiv Aug 27, 2025 · Aug 2025

SoK: Large Language Model Copyright Auditing via Fingerprinting

Shuo Shao, Yiming Li, Yu He et al. · Zhejiang University · Nanyang Technological University +3 more

Surveys LLM fingerprinting for copyright auditing and benchmarks 13 post-development robustness techniques across 149 model instances

Model Theft Model Theft nlp
PDF Code
attack arXiv Aug 18, 2025 · Aug 2025

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Weiwei Qi, Shuo Shao, Wei Gu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Markov-chain jailbreak framework combines diverse disguise strategies adaptively, achieving 90%+ ASR on GPT-4o in under 15 queries

Prompt Injection nlp
PDF
defense arXiv Aug 13, 2025 · Aug 2025

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo, Shuo Shao, Su Zhang et al. · Zhejiang University · Huawei +1 more

Adversaries reconstruct private user prompts from LLM KV-cache via inversion, collision, and injection attacks; KV-Cloak defends with reversible matrix obfuscation

Model Inversion Attack Sensitive Information Disclosure nlp
PDF