defense 2026

Tracking Capabilities for Safer Agents

Martin Odersky , Yaoyu Zhao , Yichen Xu , Oliver Bračevac , Cao Nguyen Pham

0 citations

α

Published on arXiv

2603.00991

Excessive Agency

OWASP LLM Top 10 — LLM08

Insecure Plugin Design

OWASP LLM Top 10 — LLM07

Key Finding

Agents can generate capability-safe Scala code with no significant loss in task performance while the static type system reliably prevents unsafe behaviors including information leakage and malicious side effects

TACIT (Tracked Agent Capabilities In Types)

Novel technique introduced


AI agents that interact with the real world through tool calls pose fundamental safety challenges: agents might leak private information, cause unintended side effects, or be manipulated through prompt injection. To address these challenges, we propose to put the agent in a programming-language-based "safety harness": instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking. Capabilities are program variables that regulate access to effects and resources of interest. Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do. In particular, it enables local purity, the ability to enforce that sub-computations are side-effect-free, preventing information leakage when agents process classified data. We demonstrate that extensible agent safety harnesses can be built by leveraging a strong type system with tracked capabilities. Our experiments show that agents can generate capability-safe code with no significant loss in task performance, while the type system reliably prevents unsafe behaviors such as information leakage and malicious side effects.


Key Contributions

  • Capability-safe agent safety harness (TACIT) using Scala 3 capture checking to statically track and enforce fine-grained agent permissions over tools and effects
  • Local purity enforcement that prevents information leakage by ensuring sub-computations processing sensitive/classified data are provably side-effect-free
  • Empirical validation showing LLM agents generate capability-safe code with no significant degradation in task performance while the type system blocks unsafe behaviors

🛡️ Threat Analysis


Details

Domains
nlp
Model Types
llm
Threat Tags
inference_time
Applications
ai agentsmcp-based tool callingcode agents