Zhiqing Hu, Chenxu Zhao, Jiazhong Lu et al. · China Academy of Engineering Physics · National Interdisciplinary Research Center of Engineering Physics +1 more
Triple-set vocabulary watermark for LLM text achieves higher detection accuracy than binary KGW while preserving readability
Misuse of LLM-generated text can be curbed by watermarking techniques that embed implicit signals into the output. We propose a watermark that partitions the vocabulary at each decoding step into three sets (Green/Yellow/Red) with fixed ratios and restricts sampling to the Green and Yellow sets. At detection time, we replay the same partitions, compute Green-enrichment and Red-depletion statistics, convert them to one-sided z-scores, and aggregate their p-values via Fisher's method to decide whether a passage is watermarked. We implement generation, detection, and testing on Llama 2 7B, and evaluate true-positive rate, false-positive rate, and text quality. Results show that the triple-partition scheme achieves high detection accuracy at fixed FPR while preserving readability.
llmtransformerChina Academy of Engineering Physics · National Interdisciplinary Research Center of Engineering Physics · Chengdu University of Information Technology
Xingyu Li, Xiaolei Liu, Cheng Liu et al. · National Interdisciplinary Research Center of Engineering Physics · Institute of Computer Application +2 more
Gradient-based adversarial prompt attack forces LLMs into repetitive loops, exhausting compute resources up to max output length
As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack framework based on the observation that repetitive generation can trigger low-entropy decoding loops, reliably compelling LLMs to generate until their output limits. LoopLLM introduces (1) a repetition-inducing prompt optimization that exploits autoregressive vulnerabilities to induce repetitive generation, and (2) a token-aligned ensemble optimization that aggregates gradients to improve cross-model transferability. Extensive experiments on 12 open-source and 2 commercial LLMs show that LoopLLM significantly outperforms existing methods, achieving over 90% of the maximum output length, compared to 20% for baselines, and improving transferability by around 40% to DeepSeek-V3 and Gemini 2.5 Flash.
llmtransformerNational Interdisciplinary Research Center of Engineering Physics · Institute of Computer Application · Beijing University of Posts and Telecommunications +1 more