ML Security Papers

Latest papers

10 papers

defense arXiv Mar 28, 2026 · 11d ago

Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection

Jinhu Fu, Yihang Lou, Qingyi Si et al. · Beijing University of Posts and Telecommunications · Chongqing University of Posts and Telecommunications +2 more

Identifies and repairs unsafe neural pathways in VLMs using causal mediation analysis and dual-modal safety subspace projection

Input Manipulation Attack Prompt Injection multimodalvisionnlp

PDF

defense arXiv Mar 10, 2026 · 29d ago

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Chao Shuai, Zhenguang Liu, Shaojing Fan et al. · Zhejiang University · National University of Singapore +1 more

Proposes GSD module to block semantic shortcuts in VFM-based detectors, improving generalization to unseen AI-generated image pipelines

Output Integrity Attack visiongenerative

PDF Code

defense arXiv Feb 28, 2026 · 5w ago

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

Qinghui He, Haifeng Zhang, Qiao Qin et al. · Chongqing University of Posts and Telecommunications · Ltd. +1 more

Proposes anti-feature-collapse learning to diversify forgery cues in AI-generated image detectors, improving cross-model generalization

Output Integrity Attack visiongenerative

PDF Code

benchmark arXiv Jan 24, 2026 · 10w ago

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Jiaming Liang, Haowei Liu, Chi-Man Pun · University of Macau · Chongqing University of Posts and Telecommunications

Proposes OTI, a model-free texture-based metric for quantifying per-image adversarial vulnerability without model access

Input Manipulation Attack vision

PDF Code

defense arXiv Dec 15, 2025 · Dec 2025

CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images

Bo Liu, Qiao Qin, Qinghui He · Chongqing University of Posts and Telecommunications · School of Artificial Intelligence +1 more

Proposes causal feature disentanglement in CLIP representations to generalize AI-generated image detection across unseen generative models

Output Integrity Attack vision

2 citations PDF

defense arXiv Dec 3, 2025 · Dec 2025

FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features

Zhigang Yang, Yuan Liu, Jiawei Zhang et al. · Chongqing University of Posts and Telecommunications · Chongqing University of Arts and Sciences

Lightweight adversarial example detector using 51-dim image features and shallow classifiers, generalizing across FGSM, PGD, CW, and DAmageNet attacks

Input Manipulation Attack vision

PDF

attack arXiv Dec 2, 2025 · Dec 2025

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou et al. · Beijing University of Posts and Telecommunications · Hangzhou Dianzi University +4 more

LeechHijack backdoors MCP tools to covertly parasitize LLM agent compute via runtime C2 channel, achieving 77% success undetected

Insecure Plugin Design nlp

1 citations PDF

defense International Journal of Compu... Nov 14, 2025 · Nov 2025

Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm

Fuxiang Huang, Xiaowei Fu, Shiyu Ye et al. · Chongqing University · Lingnan University +3 more

Defends unsupervised domain adaptation models against adversarial attacks via disentangled distillation post-training

Input Manipulation Attack vision

PDF

defense arXiv Sep 14, 2025 · Sep 2025

Make Identity Unextractable yet Perceptible: Synthesis-Based Privacy Protection for Subject Faces in Photos

Tao Wang, Yushu Zhang, Xiangli Xiao et al. · Nanjing University of Aeronautics and Astronautics · Jiangxi University of Finance and Economics +1 more

Synthesis-based anti-face-recognition defense generates perceptible yet identity-unextractable faces to defeat unauthorized FR systems

Input Manipulation Attack visiongenerative

PDF Code

Deep learning-based face recognition (FR) technology exacerbates privacy concerns in photo sharing. In response, the research community developed a suite of anti-FR methods to block identity extraction by unauthorized FR systems. Benefiting from quasi-imperceptible alteration, perturbation-based methods are well-suited for privacy protection of subject faces in photos, as they allow familiar persons to recognize subjects via naked eyes. However, we reveal that perturbation-based methods provide a false sense of privacy through theoretical analysis and experimental validation. Therefore, new alternative solutions should be found to protect subject faces. In this paper, we explore synthesis-based methods as a promising solution, whose challenge is to enable familiar persons to recognize subjects. To solve the challenge, we present a key insight: In most photo sharing scenarios, familiar persons recognize subjects through identity perception rather than meticulous face analysis. Based on the insight, we propose the first synthesis-based method dedicated to subject faces, i.e., PerceptFace, which can make identity unextractable yet perceptible. To enhance identity perception, a new perceptual similarity loss is designed for faces, reducing the alteration in regions of high sensitivity to human vision. As a synthesis-based method, PerceptFace can inherently provide reliable identity protection. Meanwhile, out of the confine of meticulous face analysis, PerceptFace focuses on identity perception from a more practical scenario, which is also enhanced by the designed perceptual similarity loss. Sufficient experiments show that PerceptFace achieves a superior trade-off between identity protection and identity perception compared to existing methods. We provide a public API of PerceptFace and believe that it has great potential to become a practical anti-FR tool.

cnn gan Nanjing University of Aeronautics and Astronautics · Jiangxi University of Finance and Economics · Chongqing University of Posts and Telecommunications

PDF arXiv Code

defense arXiv Jan 9, 2025 · Jan 2025

A New Perspective on Privacy Protection in Federated Learning with Granular-Ball Computing

Guannan Lai, Yihui Feng, Xin Yang et al. · Southwestern University of Finance and Economics · Chongqing University of Posts and Telecommunications +1 more

Defends federated learning against gradient reconstruction attacks by transforming images into coarse-grained graph structures before training

Model Inversion Attack visionfederated-learninggraph

PDF Code

Latest papers

Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection

When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image Detection

Diversity over Uniformity: Rethinking Representation in Generated Image Detection

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images

FeatureLens: A Highly Generalizable and Interpretable Framework for Detecting Adversarial Examples Based on Image Features

LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems

Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm

Make Identity Unextractable yet Perceptible: Synthesis-Based Privacy Protection for Subject Faces in Photos

A New Perspective on Privacy Protection in Federated Learning with Granular-Ball Computing

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue