Latest papers

2 papers
tool arXiv Feb 9, 2026 · 8w ago

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

Georgios Syros, Evan Rose, Brian Grinstead et al. · Northeastern University · Mozilla Corporation

Automated red-teaming framework that adaptively discovers indirect prompt injection attacks against LLM web agents via trajectory analysis

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Sep 15, 2025 · Sep 2025

SENTRA: Selected-Next-Token Transformer for LLM Text Detection

Mitchell Plyler, Yilun Zhang, Alexander Tuzhilin et al. · Mozilla Corporation · Ciphero AI +1 more

Novel Transformer detector using selected-next-token probabilities and contrastive pre-training to identify LLM-generated text out-of-domain

Output Integrity Attack nlp
PDF Code