Latest papers

2 papers
benchmark arXiv Feb 15, 2026 · 7w ago

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin · Zenity

LODO evaluation exposes 8.4pp AUC inflation in prompt injection classifiers and reveals production guardrails miss 63–93% of indirect attacks

Prompt Injection nlp
PDF Code
attack arXiv Feb 2, 2026 · 9w ago

Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

Tomer Kordonsky, Maayan Yamin, Noam Benzimra et al. · Technion -- Israel Institute of Technology · Zenity

Exploits LLM code-generation template recurrence to predict hidden backend vulnerabilities from observable frontend features in a black-box attack

Sensitive Information Disclosure nlp
PDF Code