ML Security Papers

Latest papers

12 papers

defense arXiv Apr 19, 2026 · 4w ago

Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Qihao Shen, Jiaxing Xuan, Zhenguang Liu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +4 more

Triple-branch deepfake detector using spatial and frequency features with mutual information losses for robust cross-dataset generalization

Output Integrity Attack visionmultimodal

PDF Code

Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.

cnn generative gan Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security · Ltd. +3 more

PDF arXiv Code

attack arXiv Apr 16, 2026 · 5w ago

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Meng Chen, Kun Wang, Li Lu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +2 more

Adversarial audio injection attack hijacking audio-language models via imperceptible audio perturbations that generalize across contexts

Input Manipulation Attack Prompt Injection audiomultimodalnlp

PDF

defense arXiv Apr 9, 2026 · 6w ago

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Weiwei Qi, Zefeng Wu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Identifies safety-critical LLM parameters via gradient analysis, enabling targeted safety tuning and preservation during fine-tuning

Prompt Injection nlp

PDF Code

defense arXiv Feb 5, 2026 · Feb 2026

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Qing Wen, Haohao Li, Zhongjie Ba et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Hypergraph-based audio deepfake detector modeling high-order feature interactions for superior cross-domain generalization

Output Integrity Attack audio

PDF

defense arXiv Feb 4, 2026 · Feb 2026

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Inference-time backdoor defense for LLMs suppresses trojan triggers in Verilog code generation via semantic consensus decoding

Model Poisoning nlp

PDF

attack arXiv Oct 3, 2025 · Oct 2025

Untargeted Jailbreak Attack

Xinzhe Huang, Wenjing Hu, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based untargeted jailbreak attack maximizes LLM unsafety probability without fixed response targets, achieving 80% ASR in 100 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

attack arXiv Oct 3, 2025 · Oct 2025

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Proposes SECRET, an adaptive jailbreak-plus-retrieval-trigger attack that extracts RAG knowledge base contents verbatim from leading commercial LLMs

Sensitive Information Disclosure Prompt Injection nlp

1 citations PDF

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

llm transformer Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security · Nanyang Technological University

PDF arXiv DOI

attack arXiv Oct 2, 2025 · Oct 2025

Dynamic Target Attack

Kedong Xiu, Churui Zeng, Tianhang Zheng et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +3 more

Gradient-based jailbreak attack using adaptive harmful-response sampling as optimization targets, achieving 87% ASR on safety-aligned LLMs in 200 iterations

Input Manipulation Attack Prompt Injection nlp

2 citations PDF Code

defense arXiv Sep 17, 2025 · Sep 2025

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

Chao Shuai, Gaojian Wang, Kun Pan et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

Proposes morphological multi-scale fusion for deepfake detection that jointly localizes manipulated regions with noise suppression

Output Integrity Attack vision

PDF

benchmark arXiv Aug 27, 2025 · Aug 2025

SoK: Large Language Model Copyright Auditing via Fingerprinting

Shuo Shao, Yiming Li, Yu He et al. · Zhejiang University · Nanyang Technological University +3 more

Surveys LLM fingerprinting for copyright auditing and benchmarks 13 post-development robustness techniques across 149 model instances

Model Theft Model Theft nlp

PDF Code

attack arXiv Aug 18, 2025 · Aug 2025

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Weiwei Qi, Shuo Shao, Wei Gu et al. · Zhejiang University · Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security +1 more

Markov-chain jailbreak framework combines diverse disguise strategies adaptively, achieving 90%+ ASR on GPT-4o in under 15 queries

Prompt Injection nlp

PDF

defense arXiv Aug 13, 2025 · Aug 2025

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Zhifan Luo, Shuo Shao, Su Zhang et al. · Zhejiang University · Huawei +1 more

Adversaries reconstruct private user prompts from LLM KV-cache via inversion, collision, and injection attacks; KV-Cloak defends with reversible matrix obfuscation

Model Inversion Attack Sensitive Information Disclosure nlp

PDF

Latest papers

Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Semantic Consensus Decoding: Backdoor Defense for Verilog Code Generation

Untargeted Jailbreak Attack

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Dynamic Target Attack

Morphology-optimized Multi-Scale Fusion: Combining Local Artifacts and Mesoscopic Semantics for Deepfake Detection and Localization

SoK: Large Language Model Copyright Auditing via Fingerprinting

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue