Latest papers

10 papers
defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative
PDF
defense arXiv Jan 30, 2026 · 9w ago

FNF: Functional Network Fingerprint for Large Language Models

Yiheng Liu, Junhao Ning, Sichen Xia et al. · Northwestern Polytechnical University · Shaanxi Normal University

Training-free LLM fingerprinting via functional network activation patterns detects unauthorized model derivatives across architectures and scales

Model Theft Model Theft nlp
PDF Code
attack arXiv Jan 30, 2026 · 9w ago

Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective

Keke Tang, Xianheng Liu, Weilong Peng et al. · Guangzhou University · University of Science and Technology of China +2 more

Transfers adversarial perturbations across 3D point cloud architectures via low-rank semantic subspace optimization

Input Manipulation Attack vision
PDF
benchmark arXiv Jan 10, 2026 · 12w ago

Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

Hongjun An, Yiliang Song, Jiangan Chen et al. · Northwestern Polytechnical University · China Telecom +1 more

Factorial framework diagnoses how manipulative natural-language prompts exploit RLHF alignment to make LLMs prioritize sycophancy over factual accuracy

Prompt Injection nlp
PDF
attack arXiv Dec 15, 2025 · Dec 2025

Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

Keke Tang, Tianyu Hao, Xiaofei Wang et al. · Guangzhou University · University of Science and Technology of China +2 more

Sparse adversarial attack on 3D point cloud classifiers using Hessian-guided cooperative subset perturbation for 100% attack success

Input Manipulation Attack vision
PDF
defense arXiv Nov 24, 2025 · Nov 2025

Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation

Junbo Zhang, Ran Chen, Qianli Zhou et al. · Northwestern Polytechnical University

Defends LLMs against jailbreaks via safety-representation intervention that reduces over-refusal without sacrificing safety alignment

Prompt Injection nlp
1 citations PDF
attack arXiv Nov 12, 2025 · Nov 2025

Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

Meixia He, Peican Zhu, Le Cheng et al. · Northwestern Polytechnical University · Inner Mongolia University +1 more

Adversarial node injection attack on hypergraph neural networks exploiting pivotal hyperedge vulnerability for transferable misclassification

Input Manipulation Attack graph
PDF
defense arXiv Sep 24, 2025 · Sep 2025

Dynamic Dual-level Defense Routing for Continual Adversarial Training

Wenxuan Wang, Chenglei Wang, Xuelin Qian · Northwestern Polytechnical University

Mixture-of-experts defense framework for continual adversarial training that avoids catastrophic forgetting across evolving attack sequences

Input Manipulation Attack vision
PDF
attack arXiv Sep 23, 2025 · Sep 2025

Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks

Yang Li, Chenyu Wang, Tingrui Wang et al. · Northwestern Polytechnical University · Zhejiang University

Diffusion-based black-box adversarial attack distills CNN and ViT attention to craft cross-architecture transferable adversarial examples

Input Manipulation Attack vision
PDF
attack arXiv Sep 18, 2025 · Sep 2025

Semantic Representation Attack against Aligned Large Language Models

Jiawei Lian, Jianhong Pan, Lefan Wang et al. · The Hong Kong Polytechnic University · Northwestern Polytechnical University

Jailbreaks safety-aligned LLMs by targeting semantic representation space rather than exact affirmative token patterns

Prompt Injection nlp
1 citations PDF Code