DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Diffusion-based image editing has made semantic level image manipulation easy for general users, but it also enables realistic local forgeries that are hard to localize. Existing benchmarks mainly focus on the binary detection of generated images or the localization of manually edited regions and do not reflect the properties of diffusion-based edits, which often blend smoothly into the original content. We present Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a large scale dataset for diffusion-based image manipulation localization (DIML) with more than 300,000 annotated images. We build DEAL-300K by using a multi-modal large language model to generate editing instructions, a mask-free diffusion editor to produce manipulated images, and an active-learning change detection pipeline to obtain pixel-level annotations. On top of this dataset, we propose a localization framework that uses a frozen Visual Foundation Model (VFM) together with Multi Frequency Prompt Tuning (MFPT) to capture both semantic and frequency-domain cues of edited regions. Trained on DEAL-300K, our method reaches a pixel-level F1 score of 82.56% on our test split and 80.97% on the external CoCoGlide benchmark, providing strong baselines and a practical foundation for future DIML research.The dataset can be accessed via https://github.com/ymhzyj/DEAL-300K.

Key Contributions

DEAL-300K: a 300K+ annotated image dataset for diffusion-based image manipulation localization (DIML), built with MLLM-generated instructions, mask-free diffusion editing, and active-learning change detection for pixel-level annotation
Multi-Frequency Prompt Tuning (MFPT) framework that pairs a frozen Visual Foundation Model with frequency-domain cues to localize diffusion-edited regions
Establishes strong baseline results (F1 82.56% on DEAL-300K, 80.97% on external CoCoGlide) and provides the largest-scale benchmark for DIML research to date

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is detecting and localizing AI-generated (diffusion-edited) regions in images — this is AI-generated content detection for output/content integrity, falling squarely under ML09.

Details

Domains

visiongenerative

Model Types

diffusiontransformervlm

Threat Tags

inference_timedigital

Datasets

DEAL-300KCoCoGlide

Applications

2026 0 cit.

Output Integrity Attack

85%