Latest papers

20 papers
attack arXiv Mar 17, 2026 · 22d ago

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang et al. · University of Technology Sydney · Griffith University +2 more

Backdoor framework for semantic segmentation introducing six attack vectors and optimized triggers, bypassing existing defenses

Model Poisoning Data Poisoning Attack vision
PDF
defense arXiv Mar 13, 2026 · 26d ago

Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

Yanna Jiang, Guangsheng Yu, Qingyuan Yu et al. · University of Technology Sydney · Independent +2 more

Defeats Neural Structural Obfuscation attacks on model watermarks by canonicalizing neural networks to restore watermark verification

Model Theft vision
PDF Code
defense arXiv Mar 9, 2026 · 4w ago

Client-Cooperative Split Learning

Haiyu Deng, Yanna Jiang, Guangsheng Yu et al. · University of Technology Sydney · CSIRO Data61 +1 more

Defends split learning against activation inversion, label clustering, and model extraction via DP and chained watermarking

Model Inversion Attack Model Theft federated-learningvision
PDF
attack arXiv Mar 1, 2026 · 5w ago

Turning Black Box into White Box: Dataset Distillation Leaks

Huajie Chen, Tianqing Zhu, Yuchen Zhong et al. · City University of Macau · CISPA Helmholtz Center for Information Security +2 more

Reveals that dataset distillation leaks training data via three-stage attack: architecture inference, membership inference, and model inversion

Model Inversion Attack Membership Inference Attack vision
PDF
attack arXiv Feb 28, 2026 · 5w ago

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Ray Telikani, Amir H. Gandomi · University of Technology Sydney

Black-box context poisoning attack on neural contextual bandits via inverse RL surrogate modeling and GP-UCB-guided PGD perturbations

Input Manipulation Attack reinforcement-learning
PDF
survey arXiv Feb 24, 2026 · 6w ago

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng et al. · University of Technology Sydney · CSIRO

Surveys LLM agentic skill security covering marketplace supply-chain attacks, prompt injection via skill payloads, and trust-tiered execution

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design nlpreinforcement-learning
PDF
attack arXiv Feb 11, 2026 · 8w ago

Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

Shuyu Chang, Haiping Huang, Yanjun Zhang et al. · Nanjing University of Posts and Telecommunications · State Key Laboratory of Tibetan Intelligence +5 more

Backdoor attack on code models using sharpness-aware training and Gumbel-Softmax triggers for cross-dataset transferability and stealthiness

Model Poisoning nlp
PDF
defense arXiv Jan 28, 2026 · 10w ago

UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion

Lulu Xue, Shengshan Hu, Wei Lu et al. · Huazhong University of Science and Technology · Institute of Guizhou Aerospace Measuring and Testing Technology +2 more

Defends machine unlearning against inversion attacks that reconstruct erased training data via cosine-space perturbations

Model Inversion Attack vision
PDF
attack arXiv Jan 17, 2026 · 11w ago

Less Is More -- Until It Breaks: Security Pitfalls of Vision Token Compression in Large Vision-Language Models

Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang et al. · Griffith University · University of Technology Sydney +1 more

Adversarial attack exploits visual token compression in VLMs by perturbing token importance rankings, causing failures only under compressed inference

Input Manipulation Attack Prompt Injection visionnlpmultimodal
PDF
attack arXiv Dec 18, 2025 · Dec 2025

Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure

Lulu Xue, Shengshan Hu, Linqiang Qian et al. · Huazhong University of Science and Technology · Tsinghua University +4 more

Novel black-box MIA exploits dual-model access after unlearning to infer membership of retained data via likelihood ratio inference

Membership Inference Attack vision
2 citations PDF
benchmark arXiv Dec 16, 2025 · Dec 2025

Black-Box Auditing of Quantum Model: Lifted Differential Privacy with Quantum Canaries

Baobao Song, Shiva Raj Pokhrel, Athanasios V. Vasilakos et al. · University of Technology Sydney · Deakin University +2 more

Black-box canary framework audits quantum ML models for memorization, empirically lower-bounding privacy leakage via quantum differential privacy

Membership Inference Attack
PDF
defense arXiv Dec 8, 2025 · Dec 2025

AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven Editing

Ziming Hong, Tianyu Huang, Runnan Chen et al. · The University of Sydney · University of Technology Sydney +3 more

Defends 3D Gaussian Splatting assets from AI editing by lifting adversarial perturbations from 2D image space into 3D Gaussian parameters

Input Manipulation Attack visiongenerative
4 citations PDF Code
defense arXiv Nov 24, 2025 · Nov 2025

SpectraNet: FFT-assisted Deep Learning Classifier for Deepfake Face Detection

Nithira Jayarathne, Naveen Basnayake, Keshawa Jayasundara et al. · University of Moratuwa · University of Technology Sydney

Proposes EfficientNet-B6 + FFT hybrid detector for deepfake faces, achieving 91% accuracy with balanced batch training

Output Integrity Attack vision
PDF
defense arXiv Nov 21, 2025 · Nov 2025

MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models

Yuqi Li, Junhao Dong, Chuanguang Yang et al. · Nanyang Technological University · Institute of Computing Technology +4 more

Defends VLMs against adversarial examples via dual multi-teacher distillation, gaining +4.32% robust accuracy with 2.3x training speedup

Input Manipulation Attack visionmultimodal
2 citations PDF Code
benchmark arXiv Oct 18, 2025 · Oct 2025

Scaling Laws for Deepfake Detection

Wenhao Wang, Longqi Cai, Taihong Xiao et al. · University of Technology Sydney · Google DeepMind

Discovers power-law scaling laws for deepfake detection using ScaleDF, the largest dataset with 14M+ images across 51 real domains and 102 generation methods

Output Integrity Attack visiongenerative
1 citations PDF
attack NDSS Sep 11, 2025 · Sep 2025

Character-Level Perturbations Disrupt LLM Watermarks

Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang et al. · University of Technology Sydney · Griffith University +1 more

Attacks LLM text watermarks via character-level perturbations that disrupt tokenization, defeating five watermarking schemes with minimal detector access

Output Integrity Attack nlp
PDF
defense arXiv Sep 2, 2025 · Sep 2025

Privacy-Utility Trade-off in Data Publication: A Bilevel Optimization Framework with Curvature-Guided Perturbation

Yi Yin, Guangquan Zhang, Hua Zuo et al. · University of Technology Sydney

Bilevel optimization framework that perturbs training data via manifold curvature guidance to defend against membership inference attacks while preserving downstream utility

Membership Inference Attack visiongenerative
PDF
defense arXiv Aug 6, 2025 · Aug 2025

Isolate Trigger: Detecting and Eliminating Adaptive Backdoor Attacks

Chengrui Sun, Hua Zhang, Haoran Gao et al. · Beijing University of Posts and Telecommunications · China Mobile Research Institute +2 more

Defends against adaptive backdoor attacks by isolating hidden triggers from benign features and applying unlearning-based model repair

Model Poisoning vision
PDF
defense arXiv Aug 3, 2025 · Aug 2025

MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection

Kuo Shi, Jie Lu, Shanshan Ye et al. · University of Technology Sydney

Proposes CLIP-based discriminative representation learning to detect AI-generated images generalizing to unseen generators like Sora

Output Integrity Attack visionmultimodal
PDF
survey arXiv Jan 2, 2025 · Jan 2025

State-of-the-art AI-based Learning Approaches for Deepfake Generation and Detection, Analyzing Opportunities, Threading through Pros, Cons, and Future Prospects

Harshika Goyal, Mohammad Saif Wajid, Mohd Anas Wajid et al. · Indian Institute of Technology · Tecnológico de Monterrey +6 more

Surveys ~400 papers on deepfake generation (GANs, VAEs, Transformers) and detection, benchmarking datasets and future challenges

Output Integrity Attack visiongenerative
5 citations PDF