Securing AI Agents in Cyber-Physical Systems: A Survey of Environmental Interactions, Deepfake Threats, and Defenses
Mohsen Hatami , Van Tuan Pham , Hozefa Lakadawala , Yu Chen
Published on arXiv
2601.20184
Output Integrity Attack
OWASP ML Top 10 — ML09
Insecure Plugin Design
OWASP LLM Top 10 — LLM07
Excessive Agency
OWASP LLM Top 10 — LLM08
Key Finding
Detection mechanisms alone are insufficient as decision authorities in safety-critical CPS; provenance- and physics-grounded trust mechanisms combined with defense-in-depth architectures are necessary for deployable security under real CPS constraints.
SENTINEL
Novel technique introduced
The increasing integration of AI agents into cyber-physical systems (CPS) introduces new security risks that extend beyond traditional cyber or physical threat models. Recent advances in generative AI enable deepfake and semantic manipulation attacks that can compromise agent perception, reasoning, and interaction with the physical environment, while emerging protocols such as the Model Context Protocol (MCP) further expand the attack surface through dynamic tool use and cross-domain context sharing. This survey provides a comprehensive review of security threats targeting AI agents in CPS, with a particular focus on environmental interactions, deepfake-driven attacks, and MCP-mediated vulnerabilities. We organize the literature using the SENTINEL framework, a lifecycle-aware methodology that integrates threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation. Through an end-to-end case study grounded in a real-world smart grid deployment, we quantitatively illustrate how timing, noise, and false-positive costs constrain deployable defenses, and why detection mechanisms alone are insufficient as decision authorities in safety-critical CPS. The survey highlights the role of provenance- and physics-grounded trust mechanisms and defense-in-depth architectures, and outlines open challenges toward trustworthy AI-enabled CPS.
Key Contributions
- SENTINEL framework: a lifecycle-aware, six-phase methodology for structured threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation
- Comprehensive taxonomy of deepfake and semantic manipulation attacks across visual, audio, textual, and behavioral modalities as they affect AI agents in CPS
- Quantitative smart-grid case study illustrating how timing, noise, and false-positive costs constrain deployable defenses, and why detection alone is insufficient as a decision authority in safety-critical settings
🛡️ Threat Analysis
The dominant theme is deepfake and AI-generated content attacks (visual, audio, textual, sensor modalities) that manipulate AI agent perception, alongside defenses including content provenance, watermarking, and forensic detection — all squarely ML09 output integrity.