Securing AI Agents in Cyber-Physical Systems: A Survey of Environmental Interactions, Deepfake Threats, and Defenses

The increasing integration of AI agents into cyber-physical systems (CPS) introduces new security risks that extend beyond traditional cyber or physical threat models. Recent advances in generative AI enable deepfake and semantic manipulation attacks that can compromise agent perception, reasoning, and interaction with the physical environment, while emerging protocols such as the Model Context Protocol (MCP) further expand the attack surface through dynamic tool use and cross-domain context sharing. This survey provides a comprehensive review of security threats targeting AI agents in CPS, with a particular focus on environmental interactions, deepfake-driven attacks, and MCP-mediated vulnerabilities. We organize the literature using the SENTINEL framework, a lifecycle-aware methodology that integrates threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation. Through an end-to-end case study grounded in a real-world smart grid deployment, we quantitatively illustrate how timing, noise, and false-positive costs constrain deployable defenses, and why detection mechanisms alone are insufficient as decision authorities in safety-critical CPS. The survey highlights the role of provenance- and physics-grounded trust mechanisms and defense-in-depth architectures, and outlines open challenges toward trustworthy AI-enabled CPS.

Key Contributions

SENTINEL framework: a lifecycle-aware, six-phase methodology for structured threat characterization, feasibility analysis under CPS constraints, defense selection, and continuous validation
Comprehensive taxonomy of deepfake and semantic manipulation attacks across visual, audio, textual, and behavioral modalities as they affect AI agents in CPS
Quantitative smart-grid case study illustrating how timing, noise, and false-positive costs constrain deployable defenses, and why detection alone is insufficient as a decision authority in safety-critical settings

🛡️ Threat Analysis

Output Integrity Attack

The dominant theme is deepfake and AI-generated content attacks (visual, audio, textual, sensor modalities) that manipulate AI agent perception, alongside defenses including content provenance, watermarking, and forensic detection — all squarely ML09 output integrity.