SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning
Hessen Bougueffa Eutamene 1, Abdellah Zakaria Sellam 2,3, Abdelmalik Taleb-Ahmed 1, Abdenour Hadid 4
1 University of Polytechnique Hauts-de-France
Published on arXiv
2604.03833
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves 94.6% mean accuracy on UniversalFakeDetect benchmark across 19 generative models including GANs, face-swapping, and diffusion methods
SPARK-IL
Novel technique introduced
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual connections. During inference, this fused embedding retrieves the $k$ nearest labeled signatures from a Milvus database using cosine similarity to facilitate predictions via majority voting, while an incremental learning strategy expands the database and employs elastic weight consolidation to preserve previously learned transformations. Evaluated on the UniversalFakeDetect benchmark across 19 generative models -- including GANs, face-swapping, and diffusion methods -- SPARK-IL achieves a 94.6\% mean accuracy, with the code to be publicly released at https://github.com/HessenUPHF/SPARK-IL.
Key Contributions
- Dual-path multi-band spectral architecture combining pixel-level and feature-level representations via FFT and KAN modules
- Retrieval-augmented classification using cosine similarity search over spectral signatures in Milvus database
- Incremental learning via elastic weight consolidation enabling adaptation to new generators without catastrophic forgetting
🛡️ Threat Analysis
AI-generated image detection (deepfake detection) - verifying content authenticity and detecting synthetic images produced by generative models.