ML Security Papers

Latest papers

16 papers

defense arXiv Mar 4, 2026 · 4w ago

Machine Pareidolia: Protecting Facial Image with Emotional Editing

Binh M. Le, Simon S. Woo · Sungkyunkwan University

Defends facial privacy by emotionally editing images with a diffusion score network to evade FR systems via targeted identity impersonation

Input Manipulation Attack visiongenerative

PDF

benchmark arXiv Dec 16, 2025 · Dec 2025

A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis

Omar Abusabha, Jiyong Uhm, Tamer Abuhmed et al. · Sungkyunkwan University

Compiler function inlining manipulates binary features to evade ML-based malware detectors and binary analysis models at inference time

Input Manipulation Attack graph

PDF

defense arXiv Nov 18, 2025 · Nov 2025

Training-free Detection of AI-generated images via Cropping Robustness

Sungik Choi, Hankook Lee, Moontae Lee · LG AI Research · University of Illinois Chicago +1 more

Training-free AI-generated image detector using Haar wavelet sensitivity and self-supervised model cropping robustness

Output Integrity Attack visiongenerative

1 citations PDF

defense Asia-Pacific Computer Systems ... Nov 3, 2025 · Nov 2025

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Minseok Kim, Hankook Lee, Hyungjoon Koo · Sungkyunkwan University

Defends RAG systems from knowledge base poisoning via lightweight post-retrieval filtering, cutting attack success rates from 0.89 to 0.02

Prompt Injection nlp

2 citations 1 influentialPDF

defense arXiv Nov 3, 2025 · Nov 2025

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Tae-Young Lee, Juwon Seo, Jong Hwan Ko et al. · Korea University · Kyung Hee University +1 more

Defends against unauthorized deepfake personalization by modifying diffusion models to resist subject-specific fine-tuning attacks

Output Integrity Attack visiongenerative

PDF Code

benchmark arXiv Oct 27, 2025 · Oct 2025

Through the Lens: Benchmarking Deepfake Detectors Against Moiré-Induced Distortions

Razaib Tariq, Minji Heo, Simon S. Woo et al. · Sungkyunkwan University · CSIRO’s Data61

Benchmarks 15 deepfake detectors against Moiré artifacts, showing up to 25.4% accuracy drop and demoiréing methods making detection worse

Output Integrity Attack vision

PDF

defense arXiv Oct 1, 2025 · Oct 2025

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

Seunghoo Hong, Geonho Son, Juhun Lee et al. · Sungkyunkwan University

Adversarial perturbations on DDIM inversion trajectories protect images from diffusion-based deepfake editing, outperforming AdvDM and Photoguard

Output Integrity Attack visiongenerative

PDF Code

defense CIKM Sep 27, 2025 · Sep 2025

Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection

Minsun Jeon, Simon S. Woo · Sungkyunkwan University

Proposes deepfake detection using defocus blur maps as physically interpretable forensic signals distinguishing real from synthetic images

Output Integrity Attack visiongenerative

2 citations PDF Code

defense ACM MM Sep 26, 2025 · Sep 2025

SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection

Inzamamul Alam, Md Tanvir Islam, Simon S. Woo · Sungkyunkwan University

Dual-domain CNN combining spatial and FFT spectral features for robust GAN/diffusion deepfake detection

Output Integrity Attack vision

1 citations PDF Code

defense arXiv Sep 26, 2025 · Sep 2025

AI Kill Switch for malicious web-based LLM agent

Sechan Lee, Sangdon Park · Sungkyunkwan University · POSTECH

Stops malicious LLM web agents by injecting invisible defensive prompts into website DOM to trigger built-in safety mechanisms

Prompt Injection Excessive Agency nlp

PDF

benchmark arXiv Sep 20, 2025 · Sep 2025

FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection

Minji Heo, Simon S. Woo · Sungkyunkwan University

Benchmark exposing that deepfake detectors rely on last-stage artifacts, failing on multi-step hybrid forgeries by up to 58.83% F1

Output Integrity Attack visiongenerative

PDF Code

benchmark arXiv Sep 16, 2025 · Sep 2025

A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

Kiho Lee, Jungkon Kim, Doowon Kim et al. · ETRI · Samsung Research +2 more

Benchmarks seven PEFT methods for code LLM security; prompt-tuning best resists TrojanPuzzle backdoor attacks while improving secure code generation

Model Poisoning nlp

PDF

defense arXiv Aug 22, 2025 · Aug 2025

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Hohyun Na, Seunghoo Hong, Simon S. Woo · Sungkyunkwan University

Defends images from diffusion-based inpainting attacks by injecting cross-attention decoy noise targeting prompt-invariant tokens

Output Integrity Attack visiongenerative

PDF Code

tool arXiv Aug 18, 2025 · Aug 2025

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods

Jaeung Lee, Suhyeon Yu, Yurim Jang et al. · Sungkyunkwan University · Rice University

Visual analytics tool for comparing machine unlearning methods, with integrated membership inference attack simulation to assess privacy

Membership Inference Attack vision

PDF Code

tool arXiv Aug 11, 2025 · Aug 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh et al. · CSIRO · Sungkyunkwan University +1 more

Builds an explainable deepfake detection pipeline combining Grad-CAM, visual captioning, and LLM-generated narratives for non-expert users

Output Integrity Attack visionnlpmultimodal

PDF

defense arXiv Aug 4, 2025 · Aug 2025

Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients

Sangjun Park, Tony Q.S. Quek, Hyowoon Seo · Kwangwoon University · Singapore University of Technology and Design +1 more

Defends split learning against malicious clients via pigeonhole-based cluster partitioning that isolates and discards poisoned updates

Data Poisoning Attack federated-learning

PDF

Latest papers

Machine Pareidolia: Protecting Facial Image with Emotional Editing

A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis

Training-free Detection of AI-generated images via Cropping Robustness

Rescuing the Unpoisoned: Efficient Defense against Knowledge Corruption Attacks on RAG Systems

Perturb a Model, Not an Image: Towards Robust Privacy Protection via Anti-Personalized Diffusion Models

Through the Lens: Benchmarking Deepfake Detectors Against Moiré-Induced Distortions

DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection

SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection

AI Kill Switch for malicious web-based LLM agent

FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection

A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting

Unlearning Comparator: A Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue