defense 2026

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Xixun Lin ¹, Yang Liu ¹, Yancheng Chen ¹, Yongxuan Wu ¹, Yucheng Ning ¹, Yilong Liu ¹, Nan Sun ¹, Shun Zhang ², Bin Chong ³, Chuan Zhou ¹, Yanan Cao ¹, Li Guo ¹

¹ Chinese Academy of Sciences

² Institute of Applied Physics and Computational Mathematics

³ Peking University

0 citations

Published on arXiv

2604.13630

Prompt Injection

OWASP LLM Top 10 — LLM01

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Achieves 38% reduction in unsafe behavior rate (UBR) and 42% reduction in attack success rate (ASR) compared to unprotected baseline while preserving task utility

SafeHarness

Novel technique introduced

The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.

Key Contributions

Four-layer defense architecture integrated into agent execution lifecycle: adversarial context filtering, tiered causal verification, privilege-separated tool control, and safe rollback with adaptive degradation
Cross-layer entropy monitoring system that escalates verification rigor and triggers rollbacks when sustained anomalies are detected
Comprehensive evaluation across four harness configurations and five attack scenarios, achieving 38% reduction in unsafe behavior rate and 42% reduction in attack success rate

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

inference_timeblack_box

Datasets

Agent-SafetyBench

Applications

autonomous agentstool-using llmsagent systems

Read PDF arXiv

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

A Safety and Security Framework for Real-World Agentic Systems

Authenticated Workflows: A Systems Approach to Protecting Agentic AI

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent

Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks

Systems Security Foundations for Agentic Computing