attack 2025

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?

Yibo Peng 1, James Song 2, Lei Li 3, Xinyu Yang 1, Mihai Christodorescu 4, Ravi Mangal 5, Corina Pasareanu 1, Haizhong Zheng 1, Beidi Chen 1

0 citations · 49 references · arXiv

α

Published on arXiv

2510.17862

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

FCV-Attack achieves a 40.7% attack success rate on GPT-5 Mini + OpenHands for CWE-538 using only black-box access and a single query, exposing all 12 tested agent-model combinations.

FCV-Attack

Novel technique introduced


Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of $40.7\%$ on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.


Key Contributions

  • Identifies Functionally Correct yet Vulnerable (FCV) patches as a novel, previously overlooked security threat to LLM-based code agents
  • Proposes FCV-Attack, a black-box single-query attack that achieves up to 40.7% attack success rate across 12 agent-model combinations on SWE-Bench
  • Demonstrates that functional correctness-focused evaluation paradigms are insufficient and calls for security-aware defenses for code agents

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
black_boxinference_timetargeted
Datasets
SWE-Bench
Applications
automated bug fixingcode agentssoftware engineering agents