defense 2025

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Toqeer Ali Syed ¹, Mishal Ateeq Almutairi , Mahmoud Abdel Moaty ²

¹ Islamic University of Madinah

² Arab Open University

2 citations · 15 references · arXiv

Published on arXiv

2512.23557

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Multimodal injection detection accuracy is significantly enhanced and cross-agent trust leakage is minimized, with agentic execution pathways stabilized across LangChain and GraphChain workflows

Cross-Agent Multimodal Provenance-Aware Defense Framework

Novel technique introduced

Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new agentic AI systems, like LangChain and GraphChain. Nevertheless, this agentic environment increases the probability of the occurrence of multimodal prompt injection (PI) attacks, in which concealed or malicious instructions carried in text, pictures, metadata, or agent-to-agent messages may spread throughout the graph and lead to unintended behavior, a breach of policy, or corruption of state. In order to mitigate these risks, this paper suggests a Cross-Agent Multimodal Provenanc- Aware Defense Framework whereby all the prompts, either user-generated or produced by upstream agents, are sanitized and all the outputs generated by an LLM are verified independently before being sent to downstream nodes. This framework contains a Text sanitizer agent, visual sanitizer agent, and output validator agent all coordinated by a provenance ledger, which keeps metadata of modality, source, and trust level throughout the entire agent network. This architecture makes sure that agent-to-agent communication abides by clear trust frames such such that injected instructions are not propagated down LangChain or GraphChain-style-workflows. The experimental assessments show that multimodal injection detection accuracy is significantly enhanced, and the cross-agent trust leakage is minimized, as well as, agentic execution pathways become stable. The framework, which expands the concept of provenance tracking and validation to the multi-agent orchestration, enhances the establishment of secure, understandable and reliable agentic AI systems.

Key Contributions

Cross-Agent Multimodal Provenance-Aware Defense Framework integrating a text sanitizer agent, visual sanitizer agent, and output validator agent coordinated by a shared provenance ledger
Provenance ledger that tracks modality, source, and trust level across an entire multi-agent network to prevent injection propagation to downstream nodes
End-to-end trust architecture for LangChain/GraphChain-style workflows that enforces clear trust boundaries on all agent-to-agent communications

🛡️ Threat Analysis

Details

Domains

nlpmultimodal

Model Types

llmvlmmultimodal

Threat Tags

inference_timedigital

Applications

multi-agent ai systemslangchain workflowsgraphchain workflowsvlm-based agentic pipelines

Read PDF arXiv DOI

Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks

Key Contributions

🛡️ Threat Analysis

Details

Similar Papers

Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems

Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents

When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels

Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion