defense arXiv Sep 29, 2025 · Sep 2025
Benedetta Tondi, Andrea Costanzo, Mauro Barni · University of Siena
Embeds high-payload semantic text watermarks in large AI-generated images to enable provenance tracking and manipulation detection
Output Integrity Attack visiongenerativenlp
We propose a high-payload image watermarking method for textual embedding, where a semantic description of the image - which may also correspond to the input text prompt-, is embedded inside the image. In order to be able to robustly embed high payloads in large-scale images - such as those produced by modern AI generators - the proposed approach builds upon a traditional watermarking scheme that exploits orthogonal and turbo codes for improved robustness, and integrates frequency-domain embedding and perceptual masking techniques to enhance watermark imperceptibility. Experiments show that the proposed method is extremely robust against a wide variety of image processing, and the embedded text can be retrieved also after traditional and AI inpainting, permitting to unveil the semantic modification the image has undergone via image-text mismatch analysis.
diffusion University of Siena
benchmark arXiv Oct 28, 2025 · Oct 2025
Pietro Bongini, Valentina Molinari, Andrea Costanzo et al. · University of Siena · IMT School
Training-free one-shot method attributes synthetic images to source generators via resynthesis and CLIP feature comparison, with a new benchmark dataset
Output Integrity Attack visiongenerative
Synthetic image source attribution is a challenging task, especially in data scarcity conditions requiring few-shot or zero-shot classification capabilities. We present a new training-free one-shot attribution method based on image resynthesis. A prompt describing the image under analysis is generated, then it is used to resynthesize the image with all the candidate sources. The image is attributed to the model which produced the resynthesis closest to the original image in a proper feature space. We also introduce a new dataset for synthetic image attribution consisting of face images from commercial and open-source text-to-image generators. The dataset provides a challenging attribution framework, useful for developing new attribution models and testing their capabilities on different generative architectures. The dataset structure allows to test approaches based on resynthesis and to compare them to few-shot methods. Results from state-of-the-art few-shot approaches and other baselines show that the proposed resynthesis method outperforms existing techniques when only a few samples are available for training or fine-tuning. The experiments also demonstrate that the new dataset is a challenging one and represents a valuable benchmark for developing and evaluating future few-shot and zero-shot methods.
diffusion multimodal University of Siena · IMT School