ML Security Papers

defense arXiv Oct 23, 2025 · Oct 2025

Li An, Yujian Liu, Yepeng Liu et al. · UC Santa Barbara · MIT-IBM Watson AI Lab

RL framework optimizes LLM text watermarking for detectability, quality, removal robustness, and spoofing resistance simultaneously

Output Integrity Attack nlp

1 citations PDF Code

defense arXiv Oct 10, 2025 · Oct 2025

Yue Huang, Hang Hua, Yujun Zhou et al. · University of Notre Dame · MIT-IBM Watson AI Lab +3 more

Proposes Safiron, a pre-execution guardrail that detects, categorizes, and explains risky LLM agent action plans before they execute

Excessive Agency nlp

5 citations 1 influentialPDF

Latest papers