defense 2026

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Yushi Feng 1, Junye Du 1, Qifan Wang 1, Zizhan Ma 2, Qian Niu 3, Yutaka Matsuo 3, Long Feng 1, Lequan Yu 1

0 citations

α

Published on arXiv

2604.09155

Prompt Injection

OWASP LLM Top 10 — LLM01

Excessive Agency

OWASP LLM Top 10 — LLM08

Key Finding

Improves safety-helpfulness-interruption Pareto frontier with user-tunable statistical guarantees on executed harmful actions

CORA

Novel technique introduced


Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and user-tunable guarantees. We propose CORA (COnformal Risk-controlled GUI Agent), a post-policy, pre-action safeguarding framework that provides statistical guarantees on harmful executed actions. CORA reformulates safety as selective action execution: we train a Guardian model to estimate action-conditional risk for each proposed step. Rather than thresholding raw scores, we leverage Conformal Risk Control to calibrate an execute/abstain boundary that satisfies a user-specified risk budget and route rejected actions to a trainable Diagnostician model, which performs multimodal reasoning over rejected actions to recommend interventions (e.g., confirm, reflect, or abort) to minimize user burden. A Goal-Lock mechanism anchors assessment to a clarified, frozen user intent to resist visual injection attacks. To rigorously evaluate this paradigm, we introduce Phone-Harm, a new benchmark of mobile safety violations with step-level harm labels under real-world settings. Experiments on Phone-Harm and public benchmarks against diverse baselines validate that CORA improves the safety--helpfulness--interruption Pareto frontier, offering a practical, statistically grounded safety paradigm for autonomous GUI execution. Code and benchmark are available at cora-agent.github.io.


Key Contributions

  • CORA framework with Guardian model for action-conditional risk estimation and Conformal Risk Control for statistical safety guarantees
  • Goal-Lock mechanism to resist visual injection attacks on agent intent
  • Phone-Harm benchmark with step-level harm labels for mobile GUI safety evaluation

🛡️ Threat Analysis

Prompt Injection

Core focus is preventing harmful VLM agent behaviors through safety guardrails, intervention mechanisms, and resistance to visual injection attacks (goal hijacking). The paper addresses agent safety, autonomous action control, and defends against prompt/visual injection that could hijack agent goals.

Excessive Agency

Framework directly addresses excessive agency in autonomous GUI agents by implementing selective action execution, user-tunable risk budgets, and intervention mechanisms (confirm/reflect/abort) to constrain agent autonomy and prevent irreversible harmful actions in mobile environments.


Details

Domains
multimodalvision
Model Types
vlmmultimodal
Threat Tags
inference_timeblack_box
Datasets
Phone-Harm
Applications
mobile gui automationautonomous agentsdigital assistants