defense 2026

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Hao Wang ¹, Niels Mündler ², Mark Vero ², Jingxuan He ¹, Dawn Song ¹, Martin Vechev ²

¹ University of California, Berkeley

² ETH Zurich

0 citations

Published on arXiv

2604.03587

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

Improves QwQ 32B secure+correct generation from 48.2% to 62.2% on CWEval and 18.2% to 22.0% on BaxBench; shows 9.9% improvement on held-out memory-safety CWEs when trained only on injection CWEs

SecPI

Novel technique introduced

Reasoning language models (RLMs) are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code. Prior training-based approaches for secure code generation face a critical limitation that prevents their direct application to RLMs: they rely on costly, manually curated security datasets covering only a limited set of vulnerabilities. At the inference level, generic security reminders consistently degrade functional correctness while triggering only shallow ad-hoc vulnerability analysis. To address these problems, we present SecPI, a fine-tuning pipeline that teaches RLMs to internalize structured security reasoning, producing secure code by default without any security instructions at inference time. SecPI filters existing general-purpose coding datasets for security-relevant tasks using an LLM-based classifier, generates high-quality security reasoning traces with a teacher model guided by a structured prompt that systematically enumerates relevant CWEs and mitigations, and fine-tunes the target model on pairs of inputs with no security prompt and teacher reasoning traces -- as a result, the model learns to reason about security autonomously rather than in response to explicit instructions. An extensive evaluation on security benchmarks with state-of-the-art open-weight reasoning models validates the effectiveness of our approach. For instance, SecPI improves the percentage of functionally correct and secure generations for QwQ 32B from 48.2% to 62.2% (+14.0 points) on CWEval and from 18.2% to 22.0% on BaxBench. Further investigation also reveals strong cross-CWE and cross-language generalization beyond training vulnerabilities. Even when trained only on injection-related CWEs, QwQ 32B generates correct and secure code 9.9% more frequently on held-out memory-safety CWEs.

Key Contributions

SecPI fine-tuning pipeline that teaches RLMs to internalize structured security reasoning without requiring security prompts at inference
LLM-based filtering of general coding datasets for security-relevant tasks and generation of high-quality security reasoning traces guided by CWE enumeration
Demonstrates strong cross-CWE and cross-language generalization - models trained only on injection vulnerabilities improve on held-out memory-safety vulnerabilities

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_time

Datasets

CWEvalBaxBench

Applications

code generationsecure programming assistants

Read PDF arXiv

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks

Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations

$C$-$ΔΘ$: Circuit-Restricted Weight Arithmetic for Selective Refusal

Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT

A Lightweight Explainable Guardrail for Prompt Safety

GAVEL: Towards rule-based safety through activation monitoring

Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers