benchmark 2026

SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

You Hu 1, Chenzhuo Zhao 2, Changfa Mo 1, Haotian Liu 3, Xiaobai Li 3,1

0 citations

α

Published on arXiv

2604.08211

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

Current detection methods fail dramatically in zero-shot transfer to scientific figures and exhibit strong generator-specific overfitting, revealing a substantial gap between existing AIGI detection capabilities and high-quality scientific figure synthesis

SciFigDetect

Novel technique introduced


Modern multimodal generators can now produce scientific figures at near-publishable quality, creating a new challenge for visual forensics and research integrity. Unlike conventional AI-generated natural images, scientific figures are structured, text-dense, and tightly aligned with scholarly semantics, making them a distinct and difficult detection target. However, existing AI-generated image detection benchmarks and methods are almost entirely developed for open-domain imagery, leaving this setting largely unexplored. We present the first benchmark for AI-generated scientific figure detection. To construct it, we develop an agent-based data pipeline that retrieves licensed source papers, performs multimodal understanding of paper text and figures, builds structured prompts, synthesizes candidate figures, and filters them through a review-driven refinement loop. The resulting benchmark covers multiple figure categories, multiple generation sources and aligned real--synthetic pairs. We benchmark representative detectors under zero-shot, cross-generator, and degraded-image settings. Results show that current methods fail dramatically in zero-shot transfer, exhibit strong generator-specific overfitting, and remain fragile under common post-processing corruptions. These findings reveal a substantial gap between existing AIGI detection capabilities and the emerging distribution of high-quality scientific figures. We hope this benchmark can serve as a foundation for future research on robust and generalizable scientific-figure forensics. The dataset is available at https://github.com/Joyce-yoyo/SciFigDetect.


Key Contributions

  • First benchmark dataset specifically for AI-generated scientific figure detection with structured real-synthetic pairs
  • Agent-based data pipeline that retrieves licensed papers, performs multimodal understanding, and synthesizes figures through review-driven refinement
  • Comprehensive evaluation of existing detectors showing dramatic failure in zero-shot transfer, generator-specific overfitting, and fragility under post-processing

🛡️ Threat Analysis

Output Integrity Attack

This paper addresses AI-generated content detection specifically for scientific figures. It creates a benchmark to evaluate detectors that verify whether scientific figures are authentic or AI-synthesized, which falls under output integrity and content authenticity verification. The paper evaluates existing detection methods on this new domain.


Details

Domains
visionmultimodalnlp
Model Types
multimodalllm
Threat Tags
inference_time
Datasets
SciFigDetect
Applications
scientific figure forensicsresearch integrity verificationai-generated content detection