DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline
Rui Zhang , Hongxia Wang , Hangqing Liu , Yang Zhou , Qiang Zeng
Published on arXiv
2511.23377
Output Integrity Attack
OWASP ML Top 10 — ML09
Key Finding
Achieves pixel-level F1 of 82.56% on DEAL-300K test split and 80.97% on the external CoCoGlide benchmark using a frozen VFM with Multi-Frequency Prompt Tuning.
MFPT (Multi-Frequency Prompt Tuning)
Novel technique introduced
Diffusion-based image editing has made semantic level image manipulation easy for general users, but it also enables realistic local forgeries that are hard to localize. Existing benchmarks mainly focus on the binary detection of generated images or the localization of manually edited regions and do not reflect the properties of diffusion-based edits, which often blend smoothly into the original content. We present Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a large scale dataset for diffusion-based image manipulation localization (DIML) with more than 300,000 annotated images. We build DEAL-300K by using a multi-modal large language model to generate editing instructions, a mask-free diffusion editor to produce manipulated images, and an active-learning change detection pipeline to obtain pixel-level annotations. On top of this dataset, we propose a localization framework that uses a frozen Visual Foundation Model (VFM) together with Multi Frequency Prompt Tuning (MFPT) to capture both semantic and frequency-domain cues of edited regions. Trained on DEAL-300K, our method reaches a pixel-level F1 score of 82.56% on our test split and 80.97% on the external CoCoGlide benchmark, providing strong baselines and a practical foundation for future DIML research.The dataset can be accessed via https://github.com/ymhzyj/DEAL-300K.
Key Contributions
- DEAL-300K: a 300K+ annotated image dataset for diffusion-based image manipulation localization (DIML), built with MLLM-generated instructions, mask-free diffusion editing, and active-learning change detection for pixel-level annotation
- Multi-Frequency Prompt Tuning (MFPT) framework that pairs a frozen Visual Foundation Model with frequency-domain cues to localize diffusion-edited regions
- Establishes strong baseline results (F1 82.56% on DEAL-300K, 80.97% on external CoCoGlide) and provides the largest-scale benchmark for DIML research to date
🛡️ Threat Analysis
Primary contribution is detecting and localizing AI-generated (diffusion-edited) regions in images — this is AI-generated content detection for output/content integrity, falling squarely under ML09.