ML Security Papers

Latest papers

14 papers

defense The IEEE/CVF Conference on Com... Mar 25, 2026 · 14d ago

Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

Zhanhe Lei, Zhongyuan Wang, Jikang Cheng et al. · Wuhan University · Peking University +2 more

Reinforcement learning curriculum that dynamically weights training samples to improve deepfake detector generalization against unseen attacks

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Mar 19, 2026 · 20d ago

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang et al. · Chinese Academy of Sciences · Beijing University of Posts and Telecommunications +1 more

Transfers safety functionality between LLMs by transplanting minimal neuron subsets, enabling alignment enhancement and jailbreak defense without retraining

Prompt Injection nlp

PDF

attack arXiv Mar 1, 2026 · 5w ago

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

Huajie Chen, Tianqing Zhu, Hailin Yang et al. · City University of Macau · CISPA Helmholtz Center for Information Security +1 more

Pixel-wise reconstruction attack removes AI-image watermarks without querying detectors or knowing the watermarking scheme

Output Integrity Attack visiongenerative

PDF

attack arXiv Jan 30, 2026 · 9w ago

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

Keke Tang, Xianheng Liu, Weilong Peng et al. · Guangzhou University · University of Science and Technology of China +2 more

Transfers adversarial perturbations across 3D point cloud architectures via low-rank semantic subspace optimization

Input Manipulation Attack vision

PDF

defense arXiv Jan 18, 2026 · 11w ago

S^2F-Net:A Robust Spatial-Spectral Fusion Framework for Cross-Model AIGC Detection

Xiangyu Hu, Yicheng Hong, Hongchuang Zheng et al. · Guangdong Ocean University · Guangzhou University

Novel spatial-spectral fusion detector exploits frequency-domain artifacts from upsampling to generalize across unseen generative architectures

Output Integrity Attack vision

PDF

attack arXiv Dec 15, 2025 · Dec 2025

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

Keke Tang, Tianyu Hao, Xiaofei Wang et al. · Guangzhou University · University of Science and Technology of China +2 more

Sparse adversarial attack on 3D point cloud classifiers using Hessian-guided cooperative subset perturbation for 100% attack success

Input Manipulation Attack vision

PDF

defense arXiv Nov 24, 2025 · Nov 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang et al. · Sun Yat-Sen University · Pengcheng Laboratory +4 more

Novel variational Bayesian framework detects audio-visual deepfakes by modeling cross-modal inconsistencies as Gaussian latent variables

Output Integrity Attack multimodalvisionaudiogenerative

1 citations PDF

attack arXiv Nov 12, 2025 · Nov 2025

Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

Meixia He, Peican Zhu, Le Cheng et al. · Northwestern Polytechnical University · Inner Mongolia University +1 more

Adversarial node injection attack on hypergraph neural networks exploiting pivotal hyperedge vulnerability for transferable misclassification

Input Manipulation Attack graph

PDF

defense arXiv Oct 22, 2025 · Oct 2025

FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models

Jia Deng, Jin Li, Zhenhua Zhao et al. · Guangzhou University · ByteDance

Test-time defense for CLIP that dynamically generates image-specific counterattack noise to neutralize adversarial perturbations without retraining

Input Manipulation Attack visionmultimodal

2 citations PDF

defense arXiv Sep 27, 2025 · Sep 2025

CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy

Zhanhong Xie, Meifan Zhang, Lihua Yin · Guangzhou University

Defends federated learning against Byzantine poisoning and gradient inversion attacks via LDP and robust aggregation with game-theoretic incentives

Data Poisoning Attack Model Inversion Attack federated-learning

PDF

defense arXiv Sep 19, 2025 · Sep 2025

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

Xiaowei Zhu, Yubing Ren, Fang Fang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +2 more

Zero-shot AI text detector using DNA-inspired mutation-repair scoring to distinguish LLM-generated from human-written text at SOTA accuracy

Output Integrity Attack nlp

PDF Code

defense arXiv Aug 31, 2025 · Aug 2025

PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement

Xubin Yue, Zhenhua Xu, Wenpeng Xing et al. · Zhejiang University · GenTel.io +1 more

Embeds ownership fingerprints in LLM parameter offsets via dual-channel knowledge editing, resisting fine-tuning erasure and feature-space defenses

Model Theft Model Theft nlp

PDF

attack arXiv Aug 8, 2025 · Aug 2025

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Wenpeng Xing, Mohan Li, Chunqiang Hu et al. · Bingjiang Institute of Zhejiang University · Zhejiang University +3 more

White-box jailbreak fuses harmful and benign hidden states in latent space to bypass LLM safety alignment with 94% ASR

Input Manipulation Attack Prompt Injection nlp

PDF

attack arXiv Aug 4, 2025 · Aug 2025

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Kanghua Mo, Li Hu, Yucheng Long et al. · Guangzhou University · The Hong Kong Polytechnic University

Attacks LLM agent tool selection via crafted metadata that induces malicious tool invocation with 81–95% success rate

Insecure Plugin Design nlp

PDF Code

Latest papers

Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

S^2F-Net:A Robust Spatial-Spectral Fusion Framework for Cross-Model AIGC Detection

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models

CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm

PREE: Towards Harmless and Adaptive Fingerprint Editing in Large Language Models via Knowledge Prefix Enhancement

Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs

Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue