defense arXiv Mar 18, 2026 · 19d ago
De Zhang Lee, Han Fang, Ee-Chien Chang · National University of Singapore
Cryptographic proof-of-authorship for diffusion-generated images by binding generation seeds to author identity using pseudorandom functions
Output Integrity Attack visiongenerative
Recent advancements in AI-generated content (AIGC) have introduced new challenges in intellectual property protection and the authentication of generated objects. We focus on scenarios in which an author seeks to assert authorship of an object generated using latent diffusion models (LDMs), in the presence of adversaries who attempt to falsely claim authorship of objects they did not create. While proof-of-ownership has been studied in the context of multimedia content through techniques such as time-stamping and watermarking, these approaches face notable limitations. In contrast to traditional content creation sources (e.g., cameras), the LDM generation process offers greater control to the author. Specifically, the random seed used during generation can be deliberately chosen. By binding the seed to the author's identity using cryptographic pseudorandom functions, the author can assert to be the creator of the object. We refer to this stronger guarantee as proof-of-authorship, since only the creator of the object can legitimately claim the object. This contrasts with proof-of-ownership via time-stamping or watermarking, where any entity could potentially claim ownership of an object by being the first to timestamp or embed the watermark. We propose a proof-of-authorship framework involving a probabilistic adjudicator who quantifies the probability that a claim is false. Furthermore, unlike prior approaches, the proposed framework does not involve any secret. We explore various attack scenarios and analyze design choices using Stable Diffusion 2.1 (SD2.1) as representative case studies.
diffusion National University of Singapore
defense arXiv Sep 15, 2025 · Sep 2025
De Zhang Lee, Han Fang, Hanyi Wang et al. · National University of Singapore · Shanghai Jiao Tong University
Attacks latent-based AIGC watermarks via boundary leakage, cutting removal distortion 15×; defends with secret boundary transformation provably equal to white-noise.
Output Integrity Attack visiongenerative
Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution. When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise. In this paper, we go beyond indistinguishability and investigate security under removal attacks. We demonstrate that indistinguishability alone does not necessarily guarantee resistance to adversarial removal. Specifically, we propose a novel attack that exploits boundary information leaked by the locations of watermarked objects. This attack significantly reduces the distortion required to remove watermarks -- by up to a factor of $15 \times$ compared to a baseline whitenoise attack under certain settings. To mitigate such attacks, we introduce a defense mechanism that applies a secret transformation to hide the boundary, and prove that the secret transformation effectively rendering any attacker's perturbations equivalent to those of a naive whitenoise adversary. Our empirical evaluations, conducted on multiple versions of Stable Diffusion, validate the effectiveness of both the attack and the proposed defense, highlighting the importance of addressing boundary leakage in latent-based watermarking schemes.
diffusion National University of Singapore · Shanghai Jiao Tong University
attack arXiv Mar 7, 2026 · 4w ago
Jialai Wang, Ya Wen, Zhongmou Liu et al. · National University of Singapore · Tsinghua University +1 more
Flip-Agent exploits hardware bit-flips to corrupt LLM agent weights, hijacking tool calls and final outputs in multi-stage pipelines
Model Poisoning Excessive Agency nlp
Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.
llm transformer National University of Singapore · Tsinghua University · Huazhong University of Science and Technology