ML Security Papers

Latest papers

18 papers

defense arXiv Apr 23, 2026 · 28d ago

ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

Junyan Luo, Peipeng Yu, Jianwei Fei et al. · Jinan University · University of Florence +1 more

Feature-space defense that perturbs facial identity embeddings to prevent face swapping attacks while keeping images visually unchanged

Input Manipulation Attack visiongenerative

PDF

attack arXiv Apr 20, 2026 · 4w ago

LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction

Zixuan Shen, Zhihua Xia, Kaikai Gan et al. · Jinan University

Reconstructs identity-preserving face images from facial recognition templates using layer-based generators for foreground, midground, and background

Model Inversion Attack vision

PDF

attack arXiv Apr 14, 2026 · 5w ago

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Junyu Ren, Xingjian Pan, Wensheng Gan et al. · Jinan University · University of Illinois Chicago

Automated prompt injection framework combining semantic and character-level mutations to jailbreak DeepSeek LLM safety guardrails

Prompt Injection nlp

PDF Code

defense arXiv Mar 25, 2026 · 8w ago

High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Peipeng Yu, Jinfeng Xie, Chengfu Ou et al. · Jinan University · University of Macau +2 more

Embeds semantic watermarks in face images for copyright protection, pixel-level deepfake localization, and content recovery after manipulation

Output Integrity Attack visiongenerative

PDF

attack arXiv Mar 10, 2026 · 10w ago

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang · Jinan University

Jailbreaks thinking-mode LLMs by interleaving multi-task streams, character reversal, and format constraints in a single prompt

Prompt Injection nlp

PDF Code

attack arXiv Mar 5, 2026 · 11w ago

Osmosis Distillation: Model Hijacking with the Fewest Samples

Yuchen Shi, Huajie Chen, Heng Xu et al. · City University of Macau · Jinan University +1 more

Poisons distilled synthetic datasets to embed hidden hijacking tasks in models fine-tuned via transfer learning

Data Poisoning Attack Transfer Learning Attack vision

PDF

defense arXiv Feb 26, 2026 · 12w ago

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Tian Zhang, Yiwei Xu, Juan Wang et al. · Wuhan University · University at Buffalo +1 more

Defends LLM agents against indirect prompt injection via causal takeover detection and context purification at tool-return boundaries

Prompt Injection Insecure Plugin Design nlp

PDF

attack arXiv Feb 1, 2026 · Feb 2026

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Xueyi Li, Zhuoneng Zhou, Zitao Liu et al. · Guangdong Institute of Smart Education · Jinan University

Adversarial attack framework targeting LLM graders via token-level gradient perturbations and prompt-level natural language manipulation

Input Manipulation Attack Prompt Injection nlp

PDF Code

defense arXiv Dec 31, 2025 · Dec 2025

Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition

Yuchao Hou, Zixuan Zhang, Jie Wang et al. · Shanxi Normal University · Guizhou University +7 more

Defends federated SAR image classifiers against backdoor attacks using frequency-domain trigger detection and noise-aware adversarial training

Model Poisoning visionfederated-learning

PDF

attack arXiv Dec 26, 2025 · Dec 2025

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

Zongmin Zhang, Zhen Sun, Yifan Liao et al. · Hong Kong University of Science and Technology · Nanjing University of Aeronautics and Astronautics +2 more

Proposes BadVSFM, a two-stage backdoor attack on prompt-driven video segmentation models where classic backdoors fail (<5% ASR)

Model Poisoning vision

PDF

attack arXiv Dec 17, 2025 · Dec 2025

CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning

Longchen Dai, Zixuan Shen, Zhiheng Zhou et al. · Jinan University

Inverts leaked face recognition templates into photorealistic face images using CLIP semantic conditioning and StyleGAN latent projection

Model Inversion Attack vision

PDF

attack arXiv Dec 5, 2025 · Dec 2025

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models

Fan Yang · Jinan University

Jailbreaks LLMs by rewriting harmful prompts into safe-isomorphic ones, generating responses, then reverse-mapping to harmful outputs

Prompt Injection nlp

PDF

defense arXiv Nov 15, 2025 · Nov 2025

Fine-Grained DINO Tuning with Dual Supervision for Face Forgery Detection

Tianxiang Zhang, Peipeng Yu, Zhihua Xia et al. · Jinan University

Proposes DFF-Adapter, a LoRA-based multi-task DINOv2 fine-tuning method for fine-grained deepfake face detection

Output Integrity Attack vision

PDF

attack arXiv Nov 6, 2025 · Nov 2025

P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models

Mingliang Hou, Yinuo Wang, Teng Guo et al. · Jilin University · TAL Education Group +1 more

Grey-box membership inference attack on educational cognitive diagnosis models exploiting exposed knowledge state visualizations

Membership Inference Attack tabular

1 citations PDF

defense arXiv Oct 19, 2025 · Oct 2025

Rotation, Scale, and Translation Resilient Black-box Fingerprinting for Intellectual Property Protection of EaaS Models

Hongjie Zhang, Zhiqi Zhao, Hanzhou Wu et al. · Sichuan Normal University · Shanghai University +3 more

Fingerprints EaaS embedding models via point-cloud topology analysis to verify ownership, resilient to rotation, scale, and translation attacks

Model Theft visionnlp

PDF

defense arXiv Oct 16, 2025 · Oct 2025

An Information Asymmetry Game for Trigger-based DNN Model Watermarking

Chaoyue Huang, Gejian Zhao, Hanzhou Wu et al. · Shanghai University · Guizhou Normal University +2 more

Game-theoretic framework for robust DNN model watermarking derives attacker's optimal pruning budget and exponential WSR lower bound

Model Theft vision

PDF

defense arXiv Sep 25, 2025 · Sep 2025

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Shuqiao Liang, Jian Liu, Renzhang Chen et al. · Jinan University

Proposes FerretNet, a lightweight 1.1M-parameter detector for AI-generated images using local pixel dependency reconstruction to expose generation artifacts

Output Integrity Attack visiongenerative

4 citations 1 influentialPDF Code

defense arXiv Aug 9, 2025 · Aug 2025

The Cost of Thinking: Increased Jailbreak Risk in Large Language Models

Fan Yang · Jinan University

Discovers thinking-mode LLMs are more jailbreak-vulnerable and defends via safe thinking intervention using special tokens

Prompt Injection nlp

PDF Code

Latest papers

ID-Eraser: Proactive Defense Against Face Swapping via Identity Perturbation

LBFTI: Layer-Based Facial Template Inversion for Identity-Preserving Fine-Grained Face Reconstruction

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Osmosis Distillation: Model Hijacking with the Fewest Samples

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition

Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models

CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning

Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models

Fine-Grained DINO Tuning with Dual Supervision for Face Forgery Detection

P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models

Rotation, Scale, and Translation Resilient Black-box Fingerprinting for Intellectual Property Protection of EaaS Models

An Information Asymmetry Game for Trigger-based DNN Model Watermarking

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

The Cost of Thinking: Increased Jailbreak Risk in Large Language Models

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue