defense arXiv Oct 2, 2025 · Oct 2025
Liyan Xie, Muhammad Siddeek, Mohamed Seif et al. · University of Minnesota · Princeton University +2 more
Combinatorial vocabulary-partitioning watermark for LLM text that detects and localizes post-generation edits and spoofing attacks
Output Integrity Attack nlp
Watermarking has become a key technique for proprietary language models, enabling the distinction between AI-generated and human-written text. However, in many real-world scenarios, LLM-generated content may undergo post-generation edits, such as human revisions or even spoofing attacks, making it critical to detect and localize such modifications. In this work, we introduce a new task: detecting post-generation edits locally made to watermarked LLM outputs. To this end, we propose a combinatorial pattern-based watermarking framework, which partitions the vocabulary into disjoint subsets and embeds the watermark by enforcing a deterministic combinatorial pattern over these subsets during generation. We accompany the combinatorial watermark with a global statistic that can be used to detect the watermark. Furthermore, we design lightweight local statistics to flag and localize potential edits. We introduce two task-specific evaluation metrics, Type-I error rate and detection accuracy, and evaluate our method on open-source LLMs across a variety of editing scenarios, demonstrating strong empirical performance in edit localization.
llm transformer University of Minnesota · Princeton University · Google +1 more
defense arXiv Oct 23, 2025 · Oct 2025
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith et al. · Princeton University · INRIA +1 more
Defends against adversarial inversion of ML feature embeddings during wireless transmission using differential privacy and channel-aware encoding
Model Inversion Attack vision
AI-based sensing at wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications, particularly for vision and perception tasks such as in autonomous driving and environmental monitoring. AI systems rely both on efficient model learning and inference. In the inference phase, features extracted from sensing data are utilized for prediction tasks (e.g., classification or regression). In edge networks, sensors and model servers are often not co-located, which requires communication of features. As sensitive personal data can be reconstructed by an adversary, transformation of the features are required to reduce the risk of privacy violations. While differential privacy mechanisms provide a means of protecting finite datasets, protection of individual features has not been addressed. In this paper, we propose a novel framework for privacy-preserving AI-based sensing, where devices apply transformations of extracted features before transmission to a model server.
cnn Princeton University · INRIA · Stony Brook University