defense 2025

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

0 citations · 53 references · arXiv

Published on arXiv

2512.03720

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

State-of-the-art models including GPT-4o and o3-mini show >90% attack success rates against TCA; CAHL significantly reduces susceptibility while preserving zero-shot generalization on generic tasks.

CAHL (Context-Aware Hierarchical Learning)

Novel technique introduced

Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when exposed to adversarial scenarios. In this work, we identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA), which exploits function-calling mechanisms to subvert model behavior. To evaluate LLM robustness against such threats, we introduce the Tool-Completion benchmark, a comprehensive security assessment framework, which reveals that even state-of-the-art models remain susceptible to TCA, with surprisingly high attack success rates. To address these vulnerabilities, we introduce Context-Aware Hierarchical Learning (CAHL), a sophisticated mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. CAHL leverages the contextual correlations between different instruction segments to establish a robust, context-aware instruction hierarchy. Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. Our code is available at https://github.com/S2AILab/CAHL.

Key Contributions

Identifies and formalizes Tool-Completion Attack (TCA), a novel prompt injection vulnerability that exploits function-calling mechanisms to make adversarial instructions appear semantically legitimate
Introduces the Tool-Completion benchmark, revealing that GPT-4o and o3-mini exhibit >90% attack success rates against TCA
Proposes Context-Aware Hierarchical Learning (CAHL), a two-stage training mechanism that enforces context-aware instruction hierarchies and significantly reduces vulnerability to both TCA and conventional prompt injection attacks

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llmtransformer

Threat Tags

inference_timedigitaltargeted

Datasets

Tool-Completion benchmark (proposed)TensorTrustStruQ evaluation sets

Applications

tool-augmented llm agentsllm agentic systemsfunction-calling apis

Read PDF arXiv DOI Code

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Defense Against Indirect Prompt Injection via Tool Result Parsing

Quantifying Conversation Drift in MCP via Latent Polytope

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

MindGuard: Intrinsic Decision Inspection for Securing LLM Agents Against Metadata Poisoning

MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI

IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

SMCP: Secure Model Context Protocol

Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE