Latest papers

2 papers
defense arXiv Oct 23, 2025 · Oct 2025

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking

Li An, Yujian Liu, Yepeng Liu et al. · UC Santa Barbara · MIT-IBM Watson AI Lab

RL framework optimizes LLM text watermarking for detectability, quality, removal robustness, and spoofing resistance simultaneously

Output Integrity Attack nlp
1 citations PDF Code
defense arXiv Oct 10, 2025 · Oct 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Yue Huang, Hang Hua, Yujun Zhou et al. · University of Notre Dame · MIT-IBM Watson AI Lab +3 more

Proposes Safiron, a pre-execution guardrail that detects, categorizes, and explains risky LLM agent action plans before they execute

Excessive Agency nlp
5 citations 1 influentialPDF