benchmark 2026

VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content

Shengyi Wu 1, Yan Hong 2, Shengyao Chen 1, Zheng Wang 1, Xianbing Sun 1, Jiahui Zhan 1, Jun Lan 2, Jianfu Zhang 1

0 citations · 32 references · arXiv

α

Published on arXiv

2601.13951

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

The multi-task segmentation-augmented detector (MiT-B2-MT) achieves best overall performance on VTONGuard, while experiments reveal that cross-paradigm generalization remains an unsolved challenge for VTON-specific deepfake detection.

MiT-B2-MT

Novel technique introduced


With the rapid advancement of generative AI, virtual try-on (VTON) systems are becoming increasingly common in e-commerce and digital entertainment. However, the growing realism of AI-generated try-on content raises pressing concerns about authenticity and responsible use. To address this, we present VTONGuard, a large-scale benchmark dataset containing over 775,000 real and synthetic try-on images. The dataset covers diverse real-world conditions, including variations in pose, background, and garment styles, and provides both authentic and manipulated examples. Based on this benchmark, we conduct a systematic evaluation of multiple detection paradigms under unified training and testing protocols. Our results reveal each method's strengths and weaknesses and highlight the persistent challenge of cross-paradigm generalization. To further advance detection, we design a multi-task framework that integrates auxiliary segmentation to enhance boundary-aware feature learning, achieving the best overall performance on VTONGuard. We expect this benchmark to enable fair comparisons, facilitate the development of more robust detection models, and promote the safe and responsible deployment of VTON technologies in practice.


Key Contributions

  • VTONGuard: a large-scale benchmark of 775,000+ real and AI-generated virtual try-on images covering diverse poses, backgrounds, and garment styles under unified training/testing protocols
  • Systematic cross-paradigm evaluation of CNN, transformer-based, and frequency-domain detectors revealing persistent generalization gaps across detection paradigms
  • MiT-B2-MT: a multi-task detection framework integrating auxiliary segmentation for boundary-aware feature learning, achieving best overall performance on VTONGuard

🛡️ Threat Analysis

Output Integrity Attack

Directly addresses AI-generated image detection and content authentication: proposes a specialized benchmark and novel multi-task detection architecture (MiT-B2-MT) to distinguish real from synthetic virtual try-on images, a core output integrity and content provenance problem.


Details

Domains
visiongenerative
Model Types
cnntransformerdiffusion
Threat Tags
inference_timedigital
Datasets
VTONGuard
Applications
ai-generated image detectione-commerce content authenticationvirtual try-on forensics