ML Security Papers

Latest papers

8 papers

tool arXiv Mar 29, 2026 · 8d ago

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

Lam Pham, Khoi Vu, Dat Tran et al. · Austrian Institute of Technology · FPT University +1 more

Deepfake speech detector analyzing how diverse bonafide sources and AI generators affect model generalization across datasets

Output Integrity Attack audio

PDF

attack arXiv Mar 13, 2026 · 24d ago

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more

Dual-stream speaker re-identification attack on anonymized voice using SSL and spectral features with staged transfer learning

Input Manipulation Attack audio

PDF

defense arXiv Mar 2, 2026 · 5w ago

Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

Jianfeng Liao, Yichen Wei, Raymond Chan Ching Bon et al. · Shenzhen Technology University · Singapore Institute of Technology +2 more

Proposes CLIP-based dual-stream deepfake detector combining global adapters and local facial anomaly streams for improved generalization

Output Integrity Attack vision

PDF Code

defense arXiv Feb 24, 2026 · 5w ago

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

Haonan An, Xiaohui Ye, Guang Hua et al. · South China University of Technology · Singapore Institute of Technology +1 more

Embeds face content as background watermark to robustly detect, localize, and recover manipulated face regions against removal attacks

Output Integrity Attack visiongenerative

PDF

The proliferation of AI-generated content has facilitated sophisticated face manipulation, severely undermining visual integrity and posing unprecedented challenges to intellectual property. In response, a common proactive defense leverages fragile watermarks to detect, localize, or even recover manipulated regions. However, these methods always assume an adversary unaware of the embedded watermark, overlooking their inherent vulnerability to watermark removal attacks. Furthermore, this fragility is exacerbated in the commonly used dual-watermark strategy that adds a robust watermark for image ownership verification, where mutual interference and limited embedding capacity reduce the fragile watermark's effectiveness. To address the gap, we propose RecoverMark, a watermarking framework that achieves robust manipulation localization, content recovery, and ownership verification simultaneously. Our key insight is twofold. First, we exploit a critical real-world constraint: an adversary must preserve the background's semantic consistency to avoid visual detection, even if they apply global, imperceptible watermark removal attacks. Second, using the image's own content (face, in this paper) as the watermark enhances extraction robustness. Based on these insights, RecoverMark treats the protected face content itself as the watermark and embeds it into the surrounding background. By designing a robust two-stage training paradigm with carefully crafted distortion layers that simulate comprehensive potential attacks and a progressive training strategy, RecoverMark achieves a robust watermark embedding in no fragile manner for image manipulation localization, recovery, and image IP protection simultaneously. Extensive experiments demonstrate the proposed RecoverMark's robustness against both seen and unseen attacks and its generalizability to in-distribution and out-of-distribution data.

gan diffusion cnn South China University of Technology · Singapore Institute of Technology · City University of Hong Kong

PDF arXiv DOI

attack arXiv Jan 26, 2026 · 10w ago

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho et al. · Singapore Institute of Technology · Aristotle University of Thessaloniki +3 more

Agentic VLM/LLM system orchestrates CW, JSMA, and STA attacks to evade deepfake detectors with improved black-box transfer

Input Manipulation Attack visionmultimodalnlp

PDF

defense TDSC Jan 17, 2026 · 11w ago

Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

Haonan An, Guang Hua, Wei Du et al. · City University of Hong Kong · Singapore Institute of Technology +3 more

Defends box-free model watermarks in generative model outputs against gradient-leakage-based removal attacks using provable gradient-manipulation shields

Output Integrity Attack visiongenerative

1 citations PDF

attack arXiv Oct 14, 2025 · Oct 2025

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava et al. · Columbia University · Singapore Institute of Technology +1 more

Dual-stream PGD attack crafts transferable, imperceptible adversarial examples that evade black-box deepfake detectors by 27% over SOTA

Input Manipulation Attack vision

2 citations PDF

attack arXiv Aug 26, 2025 · Aug 2025

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Ridwan Arefeen, Xiaoxiao Miao, Rong Tong et al. · Singapore Institute of Technology · Duke Kunshan University +1 more

Attacks voice anonymization systems by augmenting ASV training data via word-level segment rearrangement to recover speaker identity

Output Integrity Attack audio

PDF Code

Latest papers

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

DAST: A Dual-Stream Voice Anonymization Attacker with Staged Training

Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces

ARMOR: Agentic Reasoning for Methods Orchestration and Reparameterization for Robust Adversarial Attacks

Decoder Gradient Shields: A Family of Provable and High-Fidelity Methods Against Gradient-Based Box-Free Watermark Removal

MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

SegReConcat: A Data Augmentation Method for Voice Anonymization Attack

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue