defense 2025

TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data

Yizhou Zhao ¹, Xiang Li ¹, Peter Song ², Qi Long ¹, Weijie Su ¹

¹ University of Pennsylvania

² University of Michigan

0 citations · 51 references · arXiv

Published on arXiv

2511.21600

Output Integrity Attack

OWASP ML Top 10 — ML09

Key Finding

TAB-DRW achieves strong detectability and robustness against common post-processing attacks while preserving high data fidelity and fully supporting mixed continuous/discrete tabular features — outperforming prior methods (TabWak, MUSE, GLW, TabularMark) across all four key criteria.

TAB-DRW

Novel technique introduced

The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data provenance and misuse. Watermarking offers a promising solution to address these concerns by ensuring the traceability of synthetic data, but existing methods face many limitations: they are computationally expensive due to reliance on large diffusion models, struggle with mixed discrete-continuous data, or lack robustness to post-modifications. To address them, we propose TAB-DRW, an efficient and robust post-editing watermarking scheme for generative tabular data. TAB-DRW embeds watermark signals in the frequency domain: it normalizes heterogeneous features via the Yeo-Johnson transformation and standardization, applies the discrete Fourier transform (DFT), and adjusts the imaginary parts of adaptively selected entries according to precomputed pseudorandom bits. To further enhance robustness and efficiency, we introduce a novel rank-based pseudorandom bit generation method that enables row-wise retrieval without incurring storage overhead. Experiments on five benchmark tabular datasets show that TAB-DRW achieves strong detectability and robustness against common post-processing attacks, while preserving high data fidelity and fully supporting mixed-type features.

Key Contributions

Post-editing frequency-domain watermarking scheme (TAB-DRW) that normalizes heterogeneous tabular features via Yeo-Johnson transformation and embeds pseudorandom bits in DFT imaginary components
Novel rank-based pseudorandom bit generation enabling row-wise watermark retrieval without storage overhead
Demonstrated robustness against common post-processing attacks (deletion, value modification) across five benchmark datasets while supporting mixed continuous/discrete tabular data

🛡️ Threat Analysis

Output Integrity Attack

Embeds invisible statistical watermarks in the OUTPUTS of generative models (synthetic tabular data) to trace content provenance and detect misuse — this is content watermarking of model-generated data, not model weight watermarking. The watermark lives in the generated data itself, not in the model, making it squarely an output integrity / content provenance mechanism.

Details

Domains

tabulargenerative

Model Types

generative

Threat Tags

inference_timedigital

Datasets

five benchmark tabular datasets (unspecified in truncated body; likely Adult, Credit, and similar standard tabular benchmarks)

Applications

synthetic tabular data generationdata provenance trackinghealthcare data sharingfinancial data generation

Read PDF arXiv DOI Code

TAB-DRW: A DFT-based Robust Watermark for Generative Tabular Data

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Q-Tag: Watermarking Quantum Circuit Generative Models

OSI: One-step Inversion Excels in Extracting Diffusion Watermarks

Robust Detection of Synthetic Tabular Data under Schema Variability

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

Authenticated Contradictions from Desynchronized Provenance and Watermarking

BlurGuard: A Simple Approach for Robustifying Image Protection Against AI-Powered Editing