survey 2025

Toward Generalized Detection of Synthetic Media: Limitations, Challenges, and the Path to Multimodal Solutions

Redwan Hussain , Mizanur Rahman , Prithwiraj Bhattacharjee

1 citations · arXiv

α

Published on arXiv

2511.11116

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Current AI-generated media detectors frequently fail to generalize across unseen generator models and struggle with multimodal or heavily post-processed content, suggesting multimodal deep learning as a promising research direction.


Artificial intelligence (AI) in media has advanced rapidly over the last decade. The introduction of Generative Adversarial Networks (GANs) improved the quality of photorealistic image generation. Diffusion models later brought a new era of generative media. These advances made it difficult to separate real and synthetic content. The rise of deepfakes demonstrated how these tools could be misused to spread misinformation, political conspiracies, privacy violations, and fraud. For this reason, many detection models have been developed. They often use deep learning methods such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models search for visual, spatial, or temporal anomalies. However, such approaches often fail to generalize across unseen data and struggle with content from different models. In addition, existing approaches are ineffective in multimodal data and highly modified content. This study reviews twenty-four recent works on AI-generated media detection. Each study was examined individually to identify its contributions and weaknesses, respectively. The review then summarizes the common limitations and key challenges faced by current approaches. Based on this analysis, a research direction is suggested with a focus on multimodal deep learning models. Such models have the potential to provide more robust and generalized detection. It offers future researchers a clear starting point for building stronger defenses against harmful synthetic media.


Key Contributions

  • Systematic review of 24 recent AI-generated media detection methods, cataloguing individual contributions and weaknesses
  • Synthesis of common limitations across current detection approaches, including poor generalization to unseen generators and multimodal content
  • Research roadmap proposing multimodal deep learning models as a more robust path for synthetic media detection

🛡️ Threat Analysis

Output Integrity Attack

The paper is entirely about detecting AI-generated/synthetic media (deepfakes, GAN and diffusion model outputs) — a core output integrity and content authenticity problem. It reviews detection approaches, their limitations (poor cross-model generalization, vulnerability to modification), and proposes future research directions for more robust detection.


Details

Domains
visionmultimodalgenerative
Model Types
gandiffusioncnntransformer
Threat Tags
inference_time
Applications
deepfake detectionsynthetic image detectionai-generated media detection