Latest papers

12 papers
attack arXiv Mar 17, 2026 · 20d ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision
PDF
defense arXiv Mar 2, 2026 · 5w ago

Process Over Outcome: Cultivating Forensic Reasoning for Generalizable Multimodal Manipulation Detection

Yuchen Zhang, Yaxiong Wang, Kecheng Han et al. · Xi’an Jiaotong University · Hefei University of Technology +3 more

Proposes REFORM, a forensic-reasoning framework with curriculum learning and RL to generalize multimodal deepfake detection

Output Integrity Attack multimodalvisionnlpgenerative
PDF
survey arXiv Feb 24, 2026 · 5w ago

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng et al. · University of Technology Sydney · CSIRO

Surveys LLM agentic skill security covering marketplace supply-chain attacks, prompt injection via skill payloads, and trust-tiered execution

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlpreinforcement-learning
PDF
benchmark arXiv Feb 18, 2026 · 6w ago

The Vulnerability of LLM Rankers to Prompt Injection Attacks

Yu Yin, Shuai Wang, Bevan Koopman et al. · The University of Queensland · CSIRO

Benchmarks indirect prompt injection attacks on LLM rankers, revealing encoder-decoder architectures are far more resilient than decoder-only models

Prompt Injection nlp
PDF Code
defense arXiv Feb 4, 2026 · 8w ago

SIDeR: Semantic Identity Decoupling for Unrestricted Face Privacy

Zhuosen Bao, Xia Du, Zheng Lin et al. · Xiamen University of Technology · University of Hong Kong +8 more

Generates unrestricted adversarial faces using diffusion models to evade facial recognition with 99% black-box success rate

Input Manipulation Attack visiongenerative
PDF
defense arXiv Nov 21, 2025 · Nov 2025

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

Yuqi Li, Junhao Dong, Chuanguang Yang et al. · Nanyang Technological University · Institute of Computing Technology +4 more

Defends VLMs against adversarial examples via dual multi-teacher distillation, gaining +4.32% robust accuracy with 2.3x training speedup

Input Manipulation Attack visionmultimodal
2 citations PDF Code
attack arXiv Nov 19, 2025 · Nov 2025

When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers

Zhaoxin Zhang, Borui Chen, Yiming Hu et al. · City University of Macau · University of Vienna +3 more

Novel LLM jailbreak using conceptual morphology triggers to shift ideological orientation in outputs without triggering safety filters

Prompt Injection nlp
PDF
defense arXiv Nov 16, 2025 · Nov 2025

DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection

Jialiang Shen, Jiyang Zheng, Yunqi Xue et al. · The University of Sydney · Shanghai Jiao Tong University +3 more

Proposes blur-robust AI-generated image detector via DINO-based teacher-student knowledge distillation for real-world motion degradation

Output Integrity Attack vision
1 citations PDF Code
attack arXiv Oct 2, 2025 · Oct 2025

Mirage Fools the Ear, Mute Hides the Truth: Precise Targeted Adversarial Attacks on Polyphonic Sound Event Detection Systems

Junjie Su, Weifei Jin, Yuxin Cao et al. · Beijing University of Posts and Telecommunications · National University of Singapore +2 more

First targeted adversarial attack framework for polyphonic SED, inserting or deleting sound events with precise region control via preservation loss

Input Manipulation Attack audio
PDF
survey arXiv Sep 25, 2025 · Sep 2025

Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models

Kang Wei, Xin Yuan, Fushuo Huo et al. · Southeast University · CSIRO +3 more

Comprehensive survey of security threats and countermeasures for diffusion models spanning robustness, privacy, backdoors, and content integrity

Input Manipulation Attack Output Integrity Attack Model Poisoning visiongenerativemultimodal
1 citations PDF
defense arXiv Sep 19, 2025 · Sep 2025

Backdoor Mitigation via Invertible Pruning Masks

Kealan Dunnett, Reza Arablouei, Dimity Miller et al. · Queensland University of Technology · CSIRO

Pruning-based backdoor defense using invertible masks and bi-level optimization to surgically remove backdoor behavior while preserving clean accuracy

Model Poisoning vision
PDF
tool arXiv Aug 11, 2025 · Aug 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh et al. · CSIRO · Sungkyunkwan University +1 more

Builds an explainable deepfake detection pipeline combining Grad-CAM, visual captioning, and LLM-generated narratives for non-expert users

Output Integrity Attack visionnlpmultimodal
PDF