benchmark 2025

Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs

Ilham Wicaksono ^1,2, Zekun Wu ^1,2, Rahul Patel ¹, Theo King ¹, Adriano Koshiyama ^1,2, Philip Treleaven ²

¹ Holistic AI

² University College London

0 citations

Published on arXiv

2509.04802

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Tool-calling in agentic contexts exhibits 24-60% higher attack success rate than model-level baselines, and context-aware iterative attacks can compromise objectives that failed at model-level, confirming systematic gaps in traditional LLM safety evaluation

AgentSeer

Novel technique introduced

As large language models transition to agentic systems, current safety evaluation frameworks face critical gaps in assessing deployment-specific risks. We introduce AgentSeer, an observability-based evaluation framework that decomposes agentic executions into granular action and component graphs, enabling systematic agentic-situational assessment. Through cross-model validation on GPT-OSS-20B and Gemini-2.0-flash using HarmBench single turn and iterative refinement attacks, we demonstrate fundamental differences between model-level and agentic-level vulnerability profiles. Model-level evaluation reveals baseline differences: GPT-OSS-20B (39.47% ASR) versus Gemini-2.0-flash (50.00% ASR), with both models showing susceptibility to social engineering while maintaining logic-based attack resistance. However, agentic-level assessment exposes agent-specific risks invisible to traditional evaluation. We discover "agentic-only" vulnerabilities that emerge exclusively in agentic contexts, with tool-calling showing 24-60% higher ASR across both models. Cross-model analysis reveals universal agentic patterns, agent transfer operations as highest-risk tools, semantic rather than syntactic vulnerability mechanisms, and context-dependent attack effectiveness, alongside model-specific security profiles in absolute ASR levels and optimal injection strategies. Direct attack transfer from model-level to agentic contexts shows degraded performance (GPT-OSS-20B: 57% human injection ASR; Gemini-2.0-flash: 28%), while context-aware iterative attacks successfully compromise objectives that failed at model-level, confirming systematic evaluation gaps. These findings establish the urgent need for agentic-situation evaluation paradigms, with AgentSeer providing the standardized methodology and empirical validation.

Key Contributions

AgentSeer observability framework that decomposes agentic executions into action and component graphs, enabling granular agentic-situational security assessment
Empirical discovery of 'agentic-only' vulnerabilities invisible to model-level evaluation, with tool-calling showing 24-60% higher ASR across GPT-OSS-20B and Gemini-2.0-flash
Cross-model validation revealing universal agentic vulnerability patterns (agent transfer as highest-risk tool, semantic over syntactic mechanisms) versus model-specific security profiles

🛡️ Threat Analysis

Details

Domains

nlp

Model Types

llm

Threat Tags

black_boxinference_time

Datasets

HarmBench

Applications

llm agentsagentic ai systems

Read PDF arXiv

Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

ASTRA: Agentic Steerability and Risk Assessment Framework

Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

When Hallucination Costs Millions: Benchmarking AI Agents in High-Stakes Adversarial Financial Markets

The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting