Latest papers

2 papers
defense arXiv Oct 23, 2025 · Oct 2025

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking

Li An, Yujian Liu, Yepeng Liu et al. · UC Santa Barbara · MIT-IBM Watson AI Lab

RL framework optimizes LLM text watermarking for detectability, quality, removal robustness, and spoofing resistance simultaneously

Output Integrity Attack nlp
1 citations PDF Code
defense arXiv Oct 20, 2025 · Oct 2025

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

Chengquan Guo, Yuzhou Nie, Chulin Xie et al. · University of Chicago · UC Santa Barbara +3 more

Blue teaming agent for CodeGen LLMs using automated red teaming to detect malicious instructions and vulnerable code outputs

Prompt Injection nlp
PDF