AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection
Hai-Son Nguyen-Le, Hung-Cuong Nguyen-Thanh, Nhien-An Le-Khac et al. · University of Science · University College Dublin
Hai-Son Nguyen-Le, Hung-Cuong Nguyen-Thanh, Nhien-An Le-Khac et al. · University of Science · University College Dublin
Mitigates detector bias in audio deepfake detection via self-synthesis, forcing models to focus on generation artifacts rather than confounding factors
The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art performance with an average EER of 5.45\%, including a significant reduction to 1.23\% on WaveFake and 2.70\% on In-the-Wild, all while eliminating the dependency on pre-collected fake datasets. Our code is publicly available at https://github.com/NguyenLeHaiSonGit/AFSS.
Michael Rettinger, Ben Beaumont, Nhien-An Le-Khac et al. · University College Dublin
Benchmarks six public deepfake detection tools with law enforcement investigators, finding humans outperform all automated AI classifiers and forensic tools
The proliferation of deepfake imagery poses escalating challenges for practitioners tasked with verifying digital media authenticity. While detection algorithm research is abundant, empirical evaluations of publicly accessible tools that practitioners actually use remain scarce. This paper presents the first cross-paradigm evaluation of six tools, spanning two complementary detection approaches: forensic analysis tools (InVID \& WeVerify, FotoForensics, Forensically) and AI-based classifiers (DecopyAI, FaceOnLive, Bitmind). Both tool categories were evaluated by professional investigators with law enforcement experience using blinded protocols across datasets comprising authentic, tampered, and AI-generated images sourced from DF40, CelebDF, and CASIA-v2. We report three principal findings: forensic tools exhibit high recall but poor specificity, while AI classifiers demonstrate the inverse pattern; human evaluators substantially outperform all automated tools; and human-AI disagreement is asymmetric, with human judgment prevailing in the vast majority of discordant cases. We discuss implications for practitioner workflows and identify critical gaps in current detection capabilities.
Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen et al. · University College Dublin · Trinity College Dublin +1 more
Semi-supervised detector discovers latent GAN vs. diffusion model patterns to generalize AI-generated image detection across unseen generator architectures
The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant challenges to digital media authenticity. These generators are typically based on a few core architectural families, primarily Generative Adversarial Networks (GANs) and Diffusion Models (DMs). A critical vulnerability in current forensics is the failure of detectors to achieve cross-generator generalization, especially when crossing architectural boundaries (e.g., from GANs to DMs). We hypothesize that this gap stems from fundamental differences in the artifacts produced by these \textbf{distinct architectures}. In this work, we provide a theoretical analysis explaining how the distinct optimization objectives of the GAN and DM architectures lead to different manifold coverage behaviors. We demonstrate that GANs permit partial coverage, often leading to boundary artifacts, while DMs enforce complete coverage, resulting in over-smoothing patterns. Motivated by this analysis, we propose the \textbf{Tri}archy \textbf{Detect}or (TriDetect), a semi-supervised approach that enhances binary classification by discovering latent architectural patterns within the "fake" class. TriDetect employs balanced cluster assignment via the Sinkhorn-Knopp algorithm and a cross-view consistency mechanism, encouraging the model to learn fundamental architectural distincts. We evaluate our approach on two standard benchmarks and three in-the-wild datasets against 13 baselines to demonstrate its generalization capability to unseen generators.
Tharindu Lakshan Yasarathna, Nhien-An Le-Khac · University College Dublin
Surveys adversarial attack taxonomy against DL-based network anomaly detectors, covering evasion, data poisoning, and membership inference in SDN-IoT
Integrating SDN and the IoT enhances network control and flexibility. DL-based AAD systems improve security by enabling real-time threat detection in SDN-IoT networks. However, these systems remain vulnerable to adversarial attacks that manipulate input data or exploit model weaknesses, significantly degrading detection accuracy. Existing research lacks a systematic analysis of adversarial vulnerabilities specific to DL-based AAD systems in SDN-IoT environments. This SoK study introduces a structured adversarial threat model and a comprehensive taxonomy of attacks, categorising them into data, model, and hybrid-level threats. Unlike previous studies, we systematically evaluate white, black, and grey-box attack strategies across popular benchmark datasets. Our findings reveal that adversarial attacks can reduce detection accuracy by up to 48.4%, with Membership Inference causing the most significant drop. C&W and DeepFool achieve high evasion success rates. However, adversarial training enhances robustness, and its high computational overhead limits the real-time deployment of SDN-IoT applications. We propose adaptive countermeasures, including real-time adversarial mitigation, enhanced retraining mechanisms, and explainable AI-driven security frameworks. By integrating structured threat models, this study offers a more comprehensive approach to attack categorisation, impact assessment, and defence evaluation than previous research. Our work highlights critical vulnerabilities in existing DL-based AAD models and provides practical recommendations for improving resilience, interpretability, and computational efficiency. This study serves as a foundational reference for researchers and practitioners seeking to enhance DL-based AAD security in SDN-IoT networks, offering a systematic adversarial threat model and conceptual defence evaluation based on prior empirical studies.
Dat-Thinh Nguyen, Kim-Hung Le, Nhien-An Le-Khac · University College Dublin · University of Information Technology +1 more
Meta-learning stabilizes data-free model extraction attacks by reducing distribution shift in synthetic query generation
Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.