Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias
Shir Bernstein 1, David Beste 2, Daniel Ayzenshteyn 1, Lea Schonherr 2, Yisroel Mirsky 1
Published on arXiv
2508.17361
Input Manipulation Attack
OWASP ML Top 10 — ML01
Prompt Injection
OWASP LLM Top 10 — LLM01
Key Finding
FPAs successfully hijack LLM code analysis across all tested model families and programming languages, remaining effective even when models are explicitly warned about the attack via robust system prompts.
Familiar Pattern Attack (FPA)
Novel technique introduced
Large Language Models (LLMs) are increasingly trusted to perform automated code review and static analysis at scale, supporting tasks such as vulnerability detection, summarization, and refactoring. In this paper, we identify and exploit a critical vulnerability in LLM-based code analysis: an abstraction bias that causes models to overgeneralize familiar programming patterns and overlook small, meaningful bugs. Adversaries can exploit this blind spot to hijack the control flow of the LLM's interpretation with minimal edits and without affecting actual runtime behavior. We refer to this attack as a Familiar Pattern Attack (FPA). We develop a fully automated, black-box algorithm that discovers and injects FPAs into target code. Our evaluation shows that FPAs are not only effective against basic and reasoning models, but are also transferable across model families (OpenAI, Anthropic, Google), and universal across programming languages (Python, C, Rust, Go). Moreover, FPAs remain effective even when models are explicitly warned about the attack via robust system prompts. Finally, we explore positive, defensive uses of FPAs and discuss their broader implications for the reliability and safety of code-oriented LLMs.
Key Contributions
- Identifies and formalizes Familiar Pattern Attacks (FPAs) — a new adversarial class exploiting LLM abstraction bias in code understanding to cause misclassification without altering runtime behavior
- Develops a fully automated black-box algorithm that discovers and injects FPAs into target code with minimal edits
- Demonstrates cross-model transferability (OpenAI, Anthropic, Google), cross-language universality (Python, C, Rust, Go), and robustness against explicit system-prompt warnings
🛡️ Threat Analysis
FPAs are explicitly framed as adversarial examples — strategically crafted code inputs that cause LLM-integrated analysis systems to produce incorrect outputs, analogous to adversarial document injection for RAG systems. The automated black-box algorithm discovers and injects minimal semantic-preserving perturbations that systematically mislead model inference.