Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection
Published on arXiv
2604.18248
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
Local-alignment detector lifts F1 from 0.033 to 0.378 on deepset with zero additional false positives; stylometric detector adds 11.1pp F1 on indirect injection
prompt-shield
Novel technique introduced
Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete. Regular expressions miss paraphrased attacks. Fine-tuned classifiers are vulnerable to adaptive adversaries: a 2025 NAACL Findings study reported that eight published indirect-injection defenses were bypassed with greater than fifty percent attack success rates under adaptive attacks. This work proposes seven detection techniques that each port a specific mechanism from a discipline outside large-language-model security: forensic linguistics, materials-science fatigue analysis, deception technology from network security, local-sequence alignment from bioinformatics, mechanism design from economics, spectral signal analysis from epidemiology, and taint tracking from compiler theory. Three of the seven techniques are implemented in the prompt-shield v0.4.1 release (Apache 2.0) and evaluated in a four-configuration ablation across six datasets including deepset/prompt-injections, NotInject, LLMail-Inject, AgentHarm, and AgentDojo. The local-alignment detector lifts F1 on deepset from 0.033 to 0.378 with zero additional false positives. The stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark. The fatigue tracker is validated via a probing-campaign integration test. All code, data, and reproduction scripts are released under Apache 2.0.
Key Contributions
- Seven novel prompt injection detection techniques ported from forensic linguistics, bioinformatics, network security, and other disciplines
- Local-alignment detector achieves 0.378 F1 on deepset/prompt-injections (11.4x improvement) with zero false positives
- Stylometric detector adds 11.1 percentage points F1 on indirect injection benchmarks
- Open-source implementation in prompt-shield v0.4.1 with full reproducibility