defense 2025

PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection

Tuan T. Nguyen 1, Naseem Khan 1, Khang Tran 2, NhatHai Phan 2, Issa Khalil 1

0 citations · 73 references · arXiv

α

Published on arXiv

2509.26272

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

PRPO achieves the highest reasoning score of 4.55/5.0 and significantly outperforms GRPO for deepfake detection accuracy and explanation quality.

PRPO (Paragraph-level Relative Policy Optimization)

Novel technique introduced


The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake detection and propose Paragraph-level Relative Policy Optimization (PRPO), a reinforcement learning algorithm that aligns LLM reasoning with image content at the paragraph level. Experiments show that PRPO improves detection accuracy by a wide margin and achieves the highest reasoning score of 4.55/5.0. Ablation studies further demonstrate that PRPO significantly outperforms GRPO under test-time conditions. These results underscore the importance of grounding multimodal reasoning in visual evidence to enable more reliable and interpretable deepfake detection.


Key Contributions

  • Reasoning-annotated dataset for deepfake detection that pairs synthetic images with grounded visual explanations
  • PRPO (Paragraph-level Relative Policy Optimization), an RL algorithm that aligns VLM reasoning with image content at the paragraph level
  • Empirical demonstration that PRPO significantly outperforms GRPO under test-time conditions, achieving a reasoning score of 4.55/5.0

🛡️ Threat Analysis

Output Integrity Attack

Deepfake detection is explicitly an ML09 concern (AI-generated content detection / output integrity). The paper proposes a novel forensic detection method — PRPO — not merely applying an existing detector to a new domain, but introducing a new RL-based training paradigm to ground multimodal reasoning in visual evidence for synthetic media detection.


Details

Domains
visionmultimodalnlp
Model Types
vlmllmtransformer
Threat Tags
inference_timedigital
Applications
deepfake detectionsynthetic media detectionai-generated image detection