ML Security Papers

defense arXiv Oct 23, 2025 · Oct 2025

Li An, Yujian Liu, Yepeng Liu et al. · UC Santa Barbara · MIT-IBM Watson AI Lab

RL framework optimizes LLM text watermarking for detectability, quality, removal robustness, and spoofing resistance simultaneously

Output Integrity Attack nlp

1 citations PDF Code

defense arXiv Oct 20, 2025 · Oct 2025

Chengquan Guo, Yuzhou Nie, Chulin Xie et al. · University of Chicago · UC Santa Barbara +3 more

Blue teaming agent for CodeGen LLMs using automated red teaming to detect malicious instructions and vulnerable code outputs

Prompt Injection nlp

Latest papers