BanglaLorica: Design and Evaluation of a Robust Watermarking Algorithm for Large Language Models in Bangla Text Generation

As large language models (LLMs) are increasingly deployed for text generation, watermarking has become essential for authorship attribution, intellectual property protection, and misuse detection. While existing watermarking methods perform well in high-resource languages, their robustness in low-resource languages remains underexplored. This work presents the first systematic evaluation of state-of-the-art text watermarking methods: KGW, Exponential Sampling (EXP), and Waterfall, for Bangla LLM text generation under cross-lingual round-trip translation (RTT) attacks. Under benign conditions, KGW and EXP achieve high detection accuracy (>88%) with negligible perplexity and ROUGE degradation. However, RTT causes detection accuracy to collapse below RTT causes detection accuracy to collapse to 9-13%, indicating a fundamental failure of token-level watermarking. To address this, we propose a layered watermarking strategy that combines embedding-time and post-generation watermarks. Experimental results show that layered watermarking improves post-RTT detection accuracy by 25-35%, achieving 40-50% accuracy, representing a 3$\times$ to 4$\times$ relative improvement over single-layer methods, at the cost of controlled semantic degradation. Our findings quantify the robustness-quality trade-off in multilingual watermarking and establish layered watermarking as a practical, training-free solution for low-resource languages such as Bangla. Our code and data will be made public.

Key Contributions

First systematic evaluation of KGW, EXP, and Waterfall watermarking methods for Bangla LLM text under cross-lingual round-trip translation (RTT) attacks, showing detection collapse to 9–13%
Proposes BanglaLorica: a training-free double-layer watermarking strategy combining embedding-time and post-generation watermarks to create an independent statistical signal
Demonstrates 3×–4× relative detection improvement post-RTT (40–50% vs. 9–13%) and quantifies the robustness–quality trade-off in low-resource multilingual watermarking

🛡️ Threat Analysis

Output Integrity Attack

Paper embeds verifiable statistical signals in LLM-generated text outputs (not model weights) for authorship attribution and AI-generated content provenance — classic output integrity/content watermarking. The RTT attack evaluated is a watermark-evasion/removal attack on those output signals, also ML09.