Zhou Yang

h-index: 9 333 citations 23 papers (total)

Papers in Database (1)

defense arXiv Feb 21, 2026 · 6w ago

Watermarking LLM Agent Trajectories

Wenlong Meng, Chen Gong, Terry Yue Zhuo et al. · Zhejiang University · University of Virginia +2 more

Watermarks LLM agent training trajectories so models trained on stolen datasets emit detectable hook behaviors under a secret key

Output Integrity Attack nlpreinforcement-learning
PDF Code