defense 2026

SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

Mohammed Himayath Ali , Mohammed Aqib Abdullah , Mohammed Mudassir Uddin , Shahnawaz Alam

Computer Science Department

0 citations · 31 references · arXiv

Published on arXiv

2601.07835

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

SecureCAI reduces prompt injection attack success rates by 94.7% compared to baseline LLMs while maintaining 95.1% accuracy on benign security analysis tasks and achieving constitution adherence scores above 0.92 under sustained adversarial pressure.

SecureCAI

Novel technique introduced

Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversarial cybersecurity environments exposes critical vulnerabilities to prompt injection attacks where malicious instructions embedded in security artifacts manipulate model behavior. This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails, adaptive constitution evolution, and Direct Preference Optimization for unlearning unsafe response patterns, addressing the unique challenges of high-stakes security contexts where traditional safety mechanisms prove insufficient against sophisticated adversarial manipulation. Experimental evaluation demonstrates that SecureCAI reduces attack success rates by 94.7% compared to baseline models while maintaining 95.1% accuracy on benign security analysis tasks, with the framework incorporating continuous red-teaming feedback loops enabling dynamic adaptation to emerging attack strategies and achieving constitution adherence scores exceeding 0.92 under sustained adversarial pressure, thereby establishing a foundation for trustworthy integration of language model capabilities into operational cybersecurity workflows and addressing a critical gap in current approaches to AI safety within adversarial domains.

Key Contributions

Formal threat model covering injection attack surfaces in LLM-assisted SOC operations including log poisoning, malicious email content, and obfuscated malware code
SecureCAI architecture combining security-aware constitutional principles with adaptive evolution responding to emerging attack patterns via continuous red-teaming feedback loops
DPO-based unlearning methodology that suppresses unsafe response patterns while preserving 95.1% accuracy on benign security analysis tasks, achieving 94.7% reduction in attack success rates

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_time

Applications

security operations centerslog analysisphishing triagemalware explanation

Read PDF arXiv DOI

SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Defend LLMs Through Self-Consciousness

Securing AI Agents Against Prompt Injection Attacks

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Prefix Probing: Lightweight Harmful Content Detection for Large Language Models

ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts

PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features

SecInfer: Preventing Prompt Injection via Inference-time Scaling