ML Security Papers

Latest papers

10 papers

defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative

PDF

defense arXiv Jan 30, 2026 · 9w ago

FNF: Functional Network Fingerprint for Large Language Models

Yiheng Liu, Junhao Ning, Sichen Xia et al. · Northwestern Polytechnical University · Shaanxi Normal University

Training-free LLM fingerprinting via functional network activation patterns detects unauthorized model derivatives across architectures and scales

Model Theft Model Theft nlp

PDF Code

attack arXiv Jan 30, 2026 · 9w ago

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

Keke Tang, Xianheng Liu, Weilong Peng et al. · Guangzhou University · University of Science and Technology of China +2 more

Transfers adversarial perturbations across 3D point cloud architectures via low-rank semantic subspace optimization

Input Manipulation Attack vision

PDF

benchmark arXiv Jan 10, 2026 · 12w ago

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Hongjun An, Yiliang Song, Jiangan Chen et al. · Northwestern Polytechnical University · China Telecom +1 more

Factorial framework diagnoses how manipulative natural-language prompts exploit RLHF alignment to make LLMs prioritize sycophancy over factual accuracy

Prompt Injection nlp

PDF

attack arXiv Dec 15, 2025 · Dec 2025

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

Keke Tang, Tianyu Hao, Xiaofei Wang et al. · Guangzhou University · University of Science and Technology of China +2 more

Sparse adversarial attack on 3D point cloud classifiers using Hessian-guided cooperative subset perturbation for 100% attack success

Input Manipulation Attack vision

PDF

defense arXiv Nov 24, 2025 · Nov 2025

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Junbo Zhang, Ran Chen, Qianli Zhou et al. · Northwestern Polytechnical University

Defends LLMs against jailbreaks via safety-representation intervention that reduces over-refusal without sacrificing safety alignment

Prompt Injection nlp

1 citations PDF

attack arXiv Nov 12, 2025 · Nov 2025

Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

Meixia He, Peican Zhu, Le Cheng et al. · Northwestern Polytechnical University · Inner Mongolia University +1 more

Adversarial node injection attack on hypergraph neural networks exploiting pivotal hyperedge vulnerability for transferable misclassification

Input Manipulation Attack graph

PDF

defense arXiv Sep 24, 2025 · Sep 2025

Dynamic Dual-level Defense Routing for Continual Adversarial Training

Wenxuan Wang, Chenglei Wang, Xuelin Qian · Northwestern Polytechnical University

Mixture-of-experts defense framework for continual adversarial training that avoids catastrophic forgetting across evolving attack sequences

Input Manipulation Attack vision

PDF

attack arXiv Sep 23, 2025 · Sep 2025

Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks

Yang Li, Chenyu Wang, Tingrui Wang et al. · Northwestern Polytechnical University · Zhejiang University

Diffusion-based black-box adversarial attack distills CNN and ViT attention to craft cross-architecture transferable adversarial examples

Input Manipulation Attack vision

PDF

attack arXiv Sep 18, 2025 · Sep 2025

Semantic Representation Attack against Aligned Large Language Models

Jiawei Lian, Jianhong Pan, Lefan Wang et al. · The Hong Kong Polytechnic University · Northwestern Polytechnical University

Jailbreaks safety-aligned LLMs by targeting semantic representation space rather than exact affirmative token patterns

Prompt Injection nlp

1 citations PDF Code

Latest papers

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

FNF: Functional Network Fingerprint for Large Language Models

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

Dynamic Dual-level Defense Routing for Continual Adversarial Training

Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks

Semantic Representation Attack against Aligned Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue