attack 2025

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

0 citations · 39 references · arXiv

Published on arXiv

2512.21250

Input Manipulation Attack

OWASP ML Top 10 — ML01

Prompt Injection

OWASP LLM Top 10 — LLM01

Key Finding

CoTDeceptor bypasses 14 out of 15 vulnerability categories against state-of-the-art CoT-enhanced LLMs, compared to only 2 bypassed by prior obfuscation methods.

CoTDeceptor

Novel technique introduced

LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide improved robustness against obfuscated malicious code, we find that their reasoning chains and semantic abstraction processes exhibit exploitable systematic weaknesses.This allows attackers to covertly embed malicious logic, bypass code review, and propagate backdoored components throughout real-world software supply chains.To investigate this issue, we present CoTDeceptor, the first adversarial code obfuscation framework targeting CoT-enhanced LLM detectors. CoTDeceptor autonomously constructs evolving, hard-to-reverse multi-stage obfuscation strategy chains that effectively disrupt CoT-driven detection logic.We obtained malicious code provided by security enterprise, experimental results demonstrate that CoTDeceptor achieves stable and transferable evasion performance against state-of-the-art LLMs and vulnerability detection agents. CoTDeceptor bypasses 14 out of 15 vulnerability categories, compared to only 2 bypassed by prior methods. Our findings highlight potential risks in real-world software supply chains and underscore the need for more robust and interpretable LLM-powered security analysis systems.

Key Contributions

CoTDeceptor: the first adversarial code obfuscation framework specifically targeting CoT-enhanced LLM vulnerability detectors by exploiting their exposed reasoning chains
Multi-stage evolving obfuscation strategy chains that are hard to reverse and autonomously constructed without expert effort
Demonstrated transferable evasion across 14/15 vulnerability categories and multiple SOTA LLMs, far exceeding prior methods (2/15)

🛡️ Threat Analysis

Input Manipulation Attack

CoTDeceptor crafts adversarial obfuscated code inputs that cause LLM-based vulnerability detectors to misclassify (fail to detect) malicious code at inference time — a classic evasion/input-manipulation attack achieving 14/15 vulnerability category bypasses.

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

black_boxinference_timetargeteddigital

Applications

code vulnerability detectionllm-based code reviewsecurity auditingci pipeline security

Read PDF arXiv DOI Code

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval

Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability

Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading

Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias