Max Fomin

h-index: 0 0 citations 0 papers (total)

Papers in Database (1)

benchmark arXiv Feb 15, 2026 · 7w ago

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin · Zenity

LODO evaluation exposes 8.4pp AUC inflation in prompt injection classifiers and reveals production guardrails miss 63–93% of indirect attacks

Prompt Injection nlp
PDF Code