benchmark 2026

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba ¹, Qinbin Li ², Songze Li ¹

¹ Southeast University

² Huazhong University of Science and Technology

0 citations · 57 references · arXiv (Cornell University)

Published on arXiv

2602.19547

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Across six foundation models, natural language-disguised attacks achieve 14.1% higher attack success rate than code-based attacks, and all agents catastrophically fail against implicit semantic hazards despite robust defenses against explicit threats.

CIBER

Novel technique introduced

LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution capabilities remains underexplored. Existing benchmarks are limited to static datasets or simulated environments, failing to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context. To bridge this gap, we introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation to systematically assess the vulnerability of code interpreter agents against four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating a controlled study of identical models. Our results reveal that Interpreter Architecture and Model Alignment Set the Security Baseline. Structural integration enables aligned specialized models to outperform generic SOTA models. Conversely, high intelligence paradoxically increases susceptibility to complex adversarial prompts due to stronger instruction adherence. Furthermore, we identify a "Natural Language Disguise" Phenomenon, where natural language functions as a significantly more effective input modality than explicit code snippets (+14.1% ASR), thereby bypassing syntax-based defenses. Finally, we expose an alarming Security Polarization, where agents exhibit robust defenses against explicit threats yet fail catastrophically against implicit semantic hazards, highlighting a fundamental blind spot in current pattern-matching protection approaches.

Key Contributions

CIBER: an automated benchmark with dynamic attack generation, dockerized sandboxing with real system privileges, and state-aware evaluation for code interpreter agent security
Empirical finding that natural language disguise is significantly more effective than explicit code for adversarial attacks (+14.1% ASR), bypassing syntax-based defenses
Discovery of Security Polarization — agents robustly resist explicit threats but fail catastrophically against implicit semantic hazards — and identification of trusted channels (tool outputs, conversation history) as structural blind spots

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

OpenInterpreterOpenCodeInterpreter

Applications

code interpreter agentsllm agentsautonomous coding assistants

Read PDF arXiv DOI

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets

ASTRA: Agentic Steerability and Risk Assessment Framework

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents