Real-Aware Residual Model Merging for Deepfake Detection
Jinhee Park 1,2, Guisik Kim 1, Choongsang Cho 1, Junseok Kwon 2
Published on arXiv
2509.24367
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
R²M outperforms joint training and other model merging baselines across in-distribution, cross-dataset, and unseen-generator evaluation settings on DF40
R²M (Real-aware Residual Model Merging)
Novel technique introduced
Deepfake generators evolve quickly, making exhaustive data collection and repeated retraining impractical. We argue that model merging is a natural fit for deepfake detection: unlike generic multi-task settings with disjoint labels, deepfake specialists share the same binary decision and differ in generator-specific artifacts. Empirically, we show that simple weight averaging preserves Real representations while attenuating Fake-specific cues. Building upon these findings, we propose Real-aware Residual Model Merging (R$^2$M), a training-free parameter-space merging framework. R$^2$M estimates a shared Real component via a low-rank factorization of task vectors, decomposes each specialist into a Real-aligned part and a Fake residual, denoises residuals with layerwise rank truncation, and aggregates them with per-task norm matching to prevent any single generator from dominating. A concise rationale explains why a simple head suffices: the Real component induces a common separation direction in feature space, while truncated residuals contribute only minor off-axis variations. Across in-distribution, cross-dataset, and unseen-dataset, R$^2$M outperforms joint training and other merging baselines. Importantly, R$^2$M is also composable: when a new forgery family appears, we fine-tune one specialist and re-merge, eliminating the need for retraining.
Key Contributions
- R²M: a training-free parameter-space merging framework that separates a shared Real component from generator-specific Fake residuals via SVD of task vectors
- Empirical finding that weight averaging preserves Real feature representations while attenuating generator-specific Fake cues
- Composable merging strategy: integrating a new forgery family only requires fine-tuning one specialist and re-merging, without full retraining
🛡️ Threat Analysis
Proposes a novel deepfake detection methodology — AI-generated content detection falls squarely under ML09 output integrity. The contribution is a new detection architecture (R²M model merging) for detecting AI-synthesized media, not merely applying an existing detector to a new domain.