tool 2026

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

0 citations · 28 references · arXiv (Cornell University)

Published on arXiv

2602.18880

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

FOCA outperforms state-of-the-art IFDL methods (including MVSS-Net, HiFi-Net, SIDA, ForgeryGPT) in both detection accuracy and interpretability across spatial and frequency domains on the introduced FSE-Set benchmark.

FOCA

Novel technique introduced

Advances in image tampering techniques, particularly generative models, pose significant challenges to media verification, digital forensics, and public trust. Existing image forgery detection and localization (IFDL) methods suffer from two key limitations: over-reliance on semantic content while neglecting textural cues, and limited interpretability of subtle low-level tampering traces. To address these issues, we propose FOCA, a multimodal large language model-based framework that integrates discriminative features from both the RGB spatial and frequency domains via a cross-attention fusion module. This design enables accurate forgery detection and localization while providing explicit, human-interpretable cross-domain explanations. We further introduce FSE-Set, a large-scale dataset with diverse authentic and tampered images, pixel-level masks, and dual-domain annotations. Extensive experiments show that FOCA outperforms state-of-the-art methods in detection performance and interpretability across both spatial and frequency domains.

Key Contributions

FOCA: first MLLM-based framework fusing RGB spatial and frequency-domain features via a Frequency Attention Fusion (FAF) module for interpretable forgery detection and pixel-level localization
FSE-Set: large-scale dataset of 100,000 images with pixel-level tampering masks and dual-domain (RGB + frequency) natural language annotations for training and evaluating explainable IFDL systems
Human-interpretable cross-domain explanations of tampering artifacts via MLLM reasoning over both spatial and wavelet-frequency cues

🛡️ Threat Analysis

Output Integrity Attack

FOCA is a detection system for AI-generated/manipulated image content — it identifies and localizes tampered regions produced by generative models. ML09 explicitly covers AI-generated content detection (deepfakes, synthetic image detection). The paper introduces both a detection framework and a dataset (FSE-Set) for evaluating image forgery authenticity, which is a content integrity and provenance problem.

Details

Domains

visionmultimodal

Model Types

vlmtransformer

Threat Tags

inference_time

Datasets

FSE-Set

Applications

image forgery detectiondigital forensicsmedia verificationdeepfake localization

Read PDF arXiv DOI

FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization

SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

DF-LLaVA: Unlocking MLLMs for Synthetic Image Detection via Knowledge Injection and Conflict-Driven Self-Reflection

The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds

Zoom-In to Sort AI-Generated Images Out

Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs