DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection
Jialiang Shen 1,2, Jiyang Zheng 1,3, Yunqi Xue 1, Huajie Chen 4, Yu Yao 5, Hui Kang 1, Ruiqi Liu 1, Helin Gong 5, Yang Yang 4, Dadong Wang 5, Tongliang Liu 3
Published on arXiv
2511.12511
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves state-of-the-art AIGI detection performance under both motion-blurred and clean conditions, demonstrating improved generalization in real-world settings.
DINO-Detect
Novel technique introduced
With growing concerns over image authenticity and digital safety, the field of AI-generated image (AIGI) detection has progressed rapidly. Yet, most AIGI detectors still struggle under real-world degradations, particularly motion blur, which frequently occurs in handheld photography, fast motion, and compressed video. Such blur distorts fine textures and suppresses high-frequency artifacts, causing severe performance drops in real-world settings. We address this limitation with a blur-robust AIGI detection framework based on teacher-student knowledge distillation. A high-capacity teacher (DINOv3), trained on clean (i.e., sharp) images, provides stable and semantically rich representations that serve as a reference for learning. By freezing the teacher to maintain its generalization ability, we distill its feature and logit responses from sharp images to a student trained on blurred counterparts, enabling the student to produce consistent representations under motion degradation. Extensive experiments benchmarks show that our method achieves state-of-the-art performance under both motion-blurred and clean conditions, demonstrating improved generalization and real-world applicability. Source codes will be released at: https://github.com/JiaLiangShen/Dino-Detect-for-blur-robust-AIGC-Detection.
Key Contributions
- Blur-robust AIGI detection framework using teacher-student knowledge distillation with frozen DINOv3 teacher trained on clean images
- Student network trained to produce consistent representations under motion blur by mimicking teacher's feature and logit responses on sharp images
- State-of-the-art performance under both motion-blurred and clean conditions, improving real-world generalization
🛡️ Threat Analysis
Proposes a novel forensic detection architecture for identifying AI-generated images — a core ML09 concern around output integrity and content provenance. The blur-robustness contribution is a new technical method (DINO-based knowledge distillation), not merely applying an existing detector to a domain.