attack 2025

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

Simin Chen ¹, Yixin He ², Suman Jana ¹, Baishakhi Ray ¹

¹ Columbia University

² University of Southern California

2 citations · 54 references · arXiv

Published on arXiv

2509.25894

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

SWExploit achieves an attack success rate of 0.91 on functionally correct patches, compared to baseline ASRs all below 0.20, across three APR agent pipelines and five LLMs.

SWExploit

Novel technique introduced

LLM-based agents are increasingly deployed for software maintenance tasks such as automated program repair (APR). APR agents automatically fetch GitHub issues and use backend LLMs to generate patches that fix the reported bugs. However, existing work primarily focuses on the functional correctness of APR-generated patches, whether they pass hidden or regression tests, while largely ignoring potential security risks. Given the openness of platforms like GitHub, where any user can raise issues and participate in discussions, an important question arises: Can an adversarial user submit a valid issue on GitHub that misleads an LLM-based agent into generating a functionally correct but vulnerable patch? To answer this question, we propose SWExploit, which generates adversarial issue statements designed to make APR agents produce patches that are functionally correct yet vulnerable. SWExploit operates in three main steps: (1) program analysis to identify potential injection points for vulnerable payloads; (2) adversarial issue generation to provide misleading reproduction and error information while preserving the original issue semantics; and (3) iterative refinement of the adversarial issue statements based on the outputs of the APR agents. Empirical evaluation on three agent pipelines and five backend LLMs shows that SWExploit can produce patches that are both functionally correct and vulnerable (the attack success rate on the correct patch could reach 0.91, whereas the baseline ASRs are all below 0.20). Based on our evaluation, we are the first to challenge the traditional assumption that a patch passing all tests is inherently reliable and secure, highlighting critical limitations in the current evaluation paradigm for APR agents.

Key Contributions

SWExploit: a three-stage attack (program analysis, adversarial issue generation, iterative refinement) that produces adversarial GitHub issues causing APR agents to generate functionally correct but security-vulnerable patches
First empirical demonstration that a patch passing all functional tests can systematically hide injected vulnerabilities, with attack success rates reaching 0.91 across three agent pipelines and five backend LLMs
Challenges the dominant APR evaluation paradigm that equates test-passing patches with reliability and security

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_timetargeted

Datasets

SWE-bench

Applications

automated program repairllm-based software maintenance agentsgithub issue-driven code patching

Read PDF arXiv DOI

Red Teaming Program Repair Agents: When Correct Patches can Hide Vulnerabilities

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections

Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare

Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems

David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

MURMUR: Using cross-user chatter to break collaborative language agents in groups

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS