Latest papers

6 papers
defense arXiv Apr 27, 2026 · 24d ago

LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization

Bokang Zeng, Zheng Gao, Xiaoyu Li et al. · UNSW Sydney · Griffith University

Audio-visual watermarking framework that detects and localizes deepfake tampering in videos while surviving compression and multimodal misalignment

Output Integrity Attack multimodalvisionaudio
PDF
tool arXiv Mar 12, 2026 · 10w ago

OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents

Frank Li · UNSW Sydney

Deployable runtime security layer for LLM agent gateways defending against prompt injection and unsafe tool execution across ten lifecycle hooks

Prompt Injection Insecure Plugin Design nlp
PDF
attack arXiv Jan 23, 2026 · Jan 2026

DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses

Wei Song, Zhenchang Xing, Liming Zhu et al. · UNSW Sydney · CSIRO’s Data61

Attacks deepfake watermarking defenses using compressive sensing to suppress watermark signals without querying the target model

Output Integrity Attack visiongenerative
PDF
attack arXiv Jan 19, 2026 · Jan 2026

In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement

Anudeex Shetty, Aditya Joshi, Salil S. Kanhere · UNSW Sydney · The University of Melbourne

Novel drunk-persona jailbreak attack on LLMs bypasses safety tuning and induces privacy leaks across five models

Prompt Injection Sensitive Information Disclosure nlp
PDF
survey arXiv Sep 10, 2025 · Sep 2025

Adversarial Attacks Against Automated Fact-Checking: A Survey

Fanzhen Liu, Alsharif Abuadbba, Kristen Moore et al. · Macquarie University · CSIRO’s Data61 +1 more

Surveys adversarial attacks against automated fact-checking ML models, covering claim manipulation, evidence injection, and adversary-aware defenses

Input Manipulation Attack Data Poisoning Attack Prompt Injection nlpmultimodal
PDF Code
benchmark arXiv Aug 18, 2025 · Aug 2025

Systematic Analysis of MCP Security

Yongjian Guo, Puzhuo Liu, Wanlun Ma et al. · Tsinghua University · Ant Group +3 more

Catalogs 31 MCP attack methods into a unified library, empirically revealing LLM agent vulnerabilities in tool-use protocols

Insecure Plugin Design Prompt Injection nlp
PDF