Latest papers

5 papers
tool arXiv Mar 12, 2026 · 25d ago

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Frank Li · UNSW Sydney

Deployable runtime security layer for LLM agent gateways defending against prompt injection and unsafe tool execution across ten lifecycle hooks

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Jan 23, 2026 · 10w ago

DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses

Wei Song, Zhenchang Xing, Liming Zhu et al. · UNSW Sydney · CSIRO’s Data61

Attacks deepfake watermarking defenses using compressive sensing to suppress watermark signals without querying the target model

Output Integrity Attack visiongenerative
PDF
attack arXiv Jan 19, 2026 · 11w ago

In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement

Anudeex Shetty, Aditya Joshi, Salil S. Kanhere · UNSW Sydney · The University of Melbourne

Novel drunk-persona jailbreak attack on LLMs bypasses safety tuning and induces privacy leaks across five models

Prompt Injection Sensitive Information Disclosure nlp
PDF
survey arXiv Sep 10, 2025 · Sep 2025

Adversarial Attacks Against Automated Fact-Checking: A Survey

Fanzhen Liu, Alsharif Abuadbba, Kristen Moore et al. · Macquarie University · CSIRO’s Data61 +1 more

Surveys adversarial attacks against automated fact-checking ML models, covering claim manipulation, evidence injection, and adversary-aware defenses

Input Manipulation Attack Data Poisoning Attack Prompt Injection nlpmultimodal
PDF Code
benchmark arXiv Aug 18, 2025 · Aug 2025

Systematic Analysis of MCP Security

Yongjian Guo, Puzhuo Liu, Wanlun Ma et al. · Tsinghua University · Ant Group +3 more

Catalogs 31 MCP attack methods into a unified library, empirically revealing LLM agent vulnerabilities in tool-use protocols

Insecure Plugin Design Prompt Injection nlp
PDF