benchmark 2025

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

Rui Zhang , Hongxia Wang , Hangqing Liu , Yang Zhou , Qiang Zeng

0 citations · 75 references · arXiv

α

Published on arXiv

2511.23377

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Achieves pixel-level F1 of 82.56% on DEAL-300K test split and 80.97% on the external CoCoGlide benchmark using a frozen VFM with Multi-Frequency Prompt Tuning.

MFPT (Multi-Frequency Prompt Tuning)

Novel technique introduced


Diffusion-based image editing has made semantic level image manipulation easy for general users, but it also enables realistic local forgeries that are hard to localize. Existing benchmarks mainly focus on the binary detection of generated images or the localization of manually edited regions and do not reflect the properties of diffusion-based edits, which often blend smoothly into the original content. We present Diffusion-Based Image Editing Area Localization Dataset (DEAL-300K), a large scale dataset for diffusion-based image manipulation localization (DIML) with more than 300,000 annotated images. We build DEAL-300K by using a multi-modal large language model to generate editing instructions, a mask-free diffusion editor to produce manipulated images, and an active-learning change detection pipeline to obtain pixel-level annotations. On top of this dataset, we propose a localization framework that uses a frozen Visual Foundation Model (VFM) together with Multi Frequency Prompt Tuning (MFPT) to capture both semantic and frequency-domain cues of edited regions. Trained on DEAL-300K, our method reaches a pixel-level F1 score of 82.56% on our test split and 80.97% on the external CoCoGlide benchmark, providing strong baselines and a practical foundation for future DIML research.The dataset can be accessed via https://github.com/ymhzyj/DEAL-300K.


Key Contributions

  • DEAL-300K: a 300K+ annotated image dataset for diffusion-based image manipulation localization (DIML), built with MLLM-generated instructions, mask-free diffusion editing, and active-learning change detection for pixel-level annotation
  • Multi-Frequency Prompt Tuning (MFPT) framework that pairs a frozen Visual Foundation Model with frequency-domain cues to localize diffusion-edited regions
  • Establishes strong baseline results (F1 82.56% on DEAL-300K, 80.97% on external CoCoGlide) and provides the largest-scale benchmark for DIML research to date

🛡️ Threat Analysis

Output Integrity Attack

Primary contribution is detecting and localizing AI-generated (diffusion-edited) regions in images — this is AI-generated content detection for output/content integrity, falling squarely under ML09.


Details

Domains
visiongenerative
Model Types
diffusiontransformervlm
Threat Tags
inference_timedigital
Datasets
DEAL-300KCoCoGlide
Applications
diffusion-based image manipulation detectionimage forensicsfake image localization