ML Security Papers

Latest papers

6 papers

defense arXiv Mar 13, 2026 · 24d ago

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He et al. · Southeast University · Nanyang Technological University +2 more

Test-time backdoor defense for LVLMs that detects poisoned inputs via cross-modal attention anomalies and purifies them by pruning trigger tokens

Model Poisoning multimodalnlpvision

PDF

defense arXiv Feb 22, 2026 · 6w ago

ReVision : A Post-Hoc, Vision-Based Technique for Replacing Unacceptable Concepts in Image Generation Pipeline

Gurjot Singh, Prabhjot Singh, Aashima Sharma et al. · University of Waterloo · University of Melbourne +2 more

Post-hoc VLM-assisted framework detects and edits policy-violating content in diffusion model outputs without retraining

Output Integrity Attack visiongenerative

PDF

defense arXiv Nov 26, 2025 · Nov 2025

Towards Reasoning-Preserving Unlearning in Multimodal Large Language Models

Hongji Li, Junchi yao, Manjiang Yu et al. · Mohamed bin Zayed University of Artificial Intelligence · University of Queensland +1 more

Discovers that CoT reasoning leaks sensitive memorized data after unlearning; proposes activation-steering defense for multimodal LLMs

Sensitive Information Disclosure multimodalnlp

1 citations PDF

attack arXiv Aug 21, 2025 · Aug 2025

Retrieval-Augmented Review Generation for Poisoning Recommender Systems

Shiyi Yang, Xinshu Li, Guanglin Zhou et al. · University of New South Wales · CSIRO’s Data61 +2 more

Poisons recommender systems by injecting LLM-generated fake user profiles using retrieval-augmented ICL and jailbreaking to evade detection

Data Poisoning Attack nlp

PDF

tool arXiv Aug 11, 2025 · Aug 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh et al. · CSIRO · Sungkyunkwan University +1 more

Builds an explainable deepfake detection pipeline combining Grad-CAM, visual captioning, and LLM-generated narratives for non-expert users

Output Integrity Attack visionnlpmultimodal

PDF

attack arXiv Aug 4, 2025 · Aug 2025

Controllable and Stealthy Shilling Attacks via Dispersive Latent Diffusion

Shutong Qiao, Wei Yuan, Junliang Yu et al. · University of Queensland · Griffith University

Diffusion-based shilling attack generates stealthy fake user profiles to manipulate recommender system rankings while evading detection

Data Poisoning Attack generativegraph

PDF

Latest papers

Test-Time Attention Purification for Backdoored Large Vision Language Models

ReVision : A Post-Hoc, Vision-Based Technique for Replacing Unacceptable Concepts in Image Generation Pipeline

Towards Reasoning-Preserving Unlearning in Multimodal Large Language Models

Retrieval-Augmented Review Generation for Poisoning Recommender Systems

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Controllable and Stealthy Shilling Attacks via Dispersive Latent Diffusion

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue