Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection
Sunpill Kim , Chanwoo Hwang , Minsu Kim , Jae Hong Seo
Published on arXiv
2603.10504
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Commercial generative AI chatbots enable non-expert adversaries to refine deepfakes that evade state-of-the-art detectors while preserving identity and substantially improving perceptual quality, with commercial services posing greater risk than open-source models.
Semantic-Preserving Image Refinement via Generative AI Chatbots
Novel technique introduced
Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.
Key Contributions
- Demonstrates that commercial generative AI chatbots expose authenticity reasoning that can be directly reused as refinement objectives to evade deepfake detectors using only policy-compliant prompts
- Shows refined deepfakes simultaneously evade state-of-the-art detectors, preserve identity (verified by commercial face recognition APIs), and improve perceptual quality
- Reveals a structural mismatch between threat models assumed by current detection frameworks and the actual capabilities naively exposed by deployed commercial AI systems
🛡️ Threat Analysis
The paper attacks deepfake detection systems — a canonical ML09 content integrity use case — by refining AI-generated images to evade detectors while preserving identity. The contribution is demonstrating that generative AI chatbots undermine content authentication pipelines, which is squarely about output integrity and deepfake detection evasion.