Latest papers

17 papers
attack arXiv Mar 24, 2026 · 13d ago

AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

Yutao Luo, Haotian Zhu, Shuchao Pang et al. · Nanjing University of Science and Technology · Macquarie University +3 more

Backdoor attack on mobile GUI agents using benign notification icons to trigger malicious actions with 90%+ success rate

Model Poisoning visionmultimodal
PDF
attack arXiv Mar 16, 2026 · 21d ago

From Storage to Steering: Memory Control Flow Attacks on LLM Agents

Zhenlin Xu, Xiaogang Zhu, Yu Yao et al. · Adelaide University · The University of Sydney +1 more

Memory poisoning attack on LLM agents that hijacks tool selection control flow across tasks via malicious memory retrieval

Prompt Injection Excessive Agency nlp
PDF
defense arXiv Feb 11, 2026 · 7w ago

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Xinguo Feng, Zhongkui Ma, Zihan Wang et al. · The University of Queensland · CSIRO’s Data61 +1 more

Defends collaborative LLM training against gradient inversion by replacing tokens with semantically disconnected yet embedding-proximate shadow substitutes

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF
attack arXiv Jan 23, 2026 · 10w ago

DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses

Wei Song, Zhenchang Xing, Liming Zhu et al. · UNSW Sydney · CSIRO’s Data61

Attacks deepfake watermarking defenses using compressive sensing to suppress watermark signals without querying the target model

Output Integrity Attack visiongenerative
PDF
benchmark arXiv Jan 14, 2026 · 11w ago

Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents

Fengchao Chen, Tingmin Wu, Van Nguyen et al. · Monash University · CSIRO’s Data61

Benchmarks user-mediated indirect prompt injection attacks on 12 commercial LLM agents, showing 92%+ safety bypass and excessive agency risks

Prompt Injection Excessive Agency nlp
2 citations PDF
defense arXiv Jan 3, 2026 · Jan 2026

NADD: Amplifying Noise for Effective Diffusion-based Adversarial Purification

David D. Nguyen, The-Anh Ta, Yansong Gao et al. · CSIRO’s Data61 · University of Western Australia

Diffusion-based adversarial purification defense that amplifies noise and uses ring proximity correction for 44.23% robust accuracy on ImageNet, 47× faster than prior art

Input Manipulation Attack vision
PDF
defense arXiv Dec 13, 2025 · Dec 2025

Keep the Lights On, Keep the Lengths in Check: Plug-In Adversarial Detection for Time-Series LLMs in Energy Forecasting

Hua Ma, Ruoxi Sun, Minhui Xue et al. · CSIRO’s Data61 · The University of Melbourne +2 more

Defends time-series LLMs against adversarial inputs using sampling-induced divergence to detect perturbed energy forecasting sequences

Input Manipulation Attack timeseriesnlp
PDF
defense arXiv Nov 24, 2025 · Nov 2025

Re-Key-Free, Risky-Free: Adaptable Model Usage Control

Zihan Wang, Zhongkui Ma, Xinguo Feng et al. · The University of Queensland · CSIRO’s Data61 +3 more

Defends model IP with key-locked weights that survive fine-tuning, keeping unauthorized inference at near-random performance

Model Theft vision
1 citations PDF
defense arXiv Nov 10, 2025 · Nov 2025

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Zhisheng Zhang, Derui Wang, Yifan Mi et al. · Tsinghua University · Beijing University of Posts and Telecommunications +4 more

Proactive adversarial audio perturbations disrupt LLM-based voice cloning by targeting speaker encoders and ASR transcription simultaneously

Input Manipulation Attack Output Integrity Attack audionlp
PDF Code
defense arXiv Oct 30, 2025 · Oct 2025

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su et al. · Beijing University of Posts and Telecommunications · National University of Singapore +3 more

Defends Audio-Language Models against audio-based jailbreaks using universal acoustic perturbations that activate inherent model safety shortcuts

Input Manipulation Attack Prompt Injection audiomultimodalnlp
1 citations PDF Code
benchmark arXiv Oct 27, 2025 · Oct 2025

Through the Lens: Benchmarking Deepfake Detectors Against Moiré-Induced Distortions

Razaib Tariq, Minji Heo, Simon S. Woo et al. · Sungkyunkwan University · CSIRO’s Data61

Benchmarks 15 deepfake detectors against Moiré artifacts, showing up to 25.4% accuracy drop and demoiréing methods making detection worse

Output Integrity Attack vision
PDF
defense arXiv Oct 13, 2025 · Oct 2025

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

Zihan Wang, Zhiyong Ma, Zhongkui Ma et al. · The University of Queensland · CSIRO’s Data61 +1 more

Recodes inputs into an authorized model's insensitivity subspace so only that model can process them, blocking unauthorized model exploitation

Model Theft visionmultimodal
3 citations PDF Code
attack arXiv Sep 25, 2025 · Sep 2025

Poisoning Prompt-Guided Sampling in Video Large Language Models

Yuxin Cao, Wei Song, Jingling Xue et al. · National University of Singapore · University of New South Wales +1 more

Black-box adversarial perturbation attack suppresses harmful frame selection in VideoLLM prompt-guided sampling, achieving 82–99% success

Input Manipulation Attack Prompt Injection visionnlpmultimodal
1 citations PDF
survey arXiv Sep 10, 2025 · Sep 2025

Adversarial Attacks Against Automated Fact-Checking: A Survey

Fanzhen Liu, Alsharif Abuadbba, Kristen Moore et al. · Macquarie University · CSIRO’s Data61 +1 more

Surveys adversarial attacks against automated fact-checking ML models, covering claim manipulation, evidence injection, and adversary-aware defenses

Input Manipulation Attack Data Poisoning Attack Prompt Injection nlpmultimodal
PDF Code
attack arXiv Aug 21, 2025 · Aug 2025

Retrieval-Augmented Review Generation for Poisoning Recommender Systems

Shiyi Yang, Xinshu Li, Guanglin Zhou et al. · University of New South Wales · CSIRO’s Data61 +2 more

Poisons recommender systems by injecting LLM-generated fake user profiles using retrieval-augmented ICL and jailbreaking to evade detection

Data Poisoning Attack nlp
PDF
attack arXiv Aug 14, 2025 · Aug 2025

Failures to Surface Harmful Contents in Video Large Language Models

Yuxin Cao, Wei Song, Derui Wang et al. · National University of Singapore · University of New South Wales +1 more

Three black-box attacks exploit VideoLLM architectural blind spots to hide harmful video content from generated summaries with >90% success rate

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF Code
defense arXiv Aug 7, 2025 · Aug 2025

From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization

Farah Wahida, M.A.P. Chamikara, Yashothara Shanmugarasa et al. · RMIT University · CSIRO’s Data61 +1 more

Uses VLM ensemble majority voting to detect and neutralize backdoor-poisoned training images in face recognition systems

Model Poisoning vision
PDF