ML Security Papers

Latest papers

2 papers

tool arXiv Feb 9, 2026 · 8w ago

Georgios Syros, Evan Rose, Brian Grinstead et al. · Northeastern University · Mozilla Corporation

Automated red-teaming framework that adaptively discovers indirect prompt injection attacks against LLM web agents via trajectory analysis

Prompt Injection Excessive Agency nlp

defense arXiv Sep 15, 2025 · Sep 2025

Mitchell Plyler, Yilun Zhang, Alexander Tuzhilin et al. · Mozilla Corporation · Ciphero AI +1 more

Novel Transformer detector using selected-next-token probabilities and contrastive pre-training to identify LLM-generated text out-of-domain

Output Integrity Attack nlp