Latest papers

12 papers
defense arXiv Mar 26, 2026 · 11d ago

SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

Sahibzada Adil Shahzad, Ammarah Hashmi, Junichi Yamagishi et al. · National Institute of Informatics · Academia Sinica +2 more

Self-supervised multimodal deepfake detector trained on real videos, detecting visual tampering artifacts and audio-visual lip-sync inconsistencies

Output Integrity Attack multimodalvisionaudio
PDF
tool arXiv Mar 18, 2026 · 19d ago

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

Chenyang Zhu, Maorong Wang, Jun Liu et al. · The University of Tokyo · National Institute of Informatics

Agentic framework orchestrating multiple AIGI detectors via reinforcement learning for extensible, train-free AI-generated image detection

Output Integrity Attack visionmultimodalnlp
PDF
defense arXiv Feb 26, 2026 · 5w ago

Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper

Hoan My Tran, Xin Wang, Wanying Ge et al. · Université de Rennes · National Institute of Informatics

Fine-tunes Whisper to detect synthetic deepfake words in audio via next-token prediction with special boundary tokens

Output Integrity Attack audio
PDF
attack arXiv Jan 28, 2026 · 9w ago

Self Voice Conversion as an Attack against Neural Audio Watermarking

Yigitcan Özer, Wanying Ge, Zhe Zhang et al. · National Institute of Informatics

Attacks audio watermarks by passing speech through self voice conversion, stripping embedded marks while preserving speaker identity and content

Output Integrity Attack audio
1 citations PDF
attack arXiv Jan 17, 2026 · 11w ago

Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity

Jun Liu, Leo Yu Zhang, Fengpeng Li et al. · University of Macau · National Institute of Informatics +2 more

Hard-label black-box adversarial attack using frequency-domain initialization and pattern-driven optimization to recover gradient sign information

Input Manipulation Attack vision
PDF Code
defense arXiv Dec 17, 2025 · Dec 2025

SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification

Hongbo Wang, MaungMaung AprilPyone, Isao Echizen · The University of Tokyo · National Institute of Informatics +1 more

Neuron-level white-box defense suppresses toxic expert neurons in VLMs, cutting harmful outputs from 48% to 2.5% under adversarial jailbreaks

Prompt Injection nlpmultimodalvision
1 citations PDF Code
attack arXiv Oct 30, 2025 · Oct 2025

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

Thanh Le, Hai Duong, Yusheng Ji et al. · The Graduate University for Advanced Studies · National Institute of Informatics +2 more

Grey-box attack on DRL-based 5G schedulers uses polytope abstract domains to craft adversarial CSI inputs degrading victim throughput by 70%

Input Manipulation Attack reinforcement-learning
1 citations PDF
defense Asia-Pacific Signal and Inform... Oct 10, 2025 · Oct 2025

Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation

Yuki Nii, Futa Waseda, Ching-Chun Chang et al. · The University of Tokyo · National Institute of Informatics

Adversarial perturbations embedded in grayscale images to disrupt AI colorization models and prevent unauthorized copyright infringement

Output Integrity Attack visiongenerative
PDF
defense arXiv Oct 6, 2025 · Oct 2025

WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection

Xi Xuan, Xuechen Liu, Wenxin Zhang et al. · University of Eastern Finland · National Institute of Informatics +4 more

Novel wavelet prompt-tuning architecture for speech deepfake detection, outperforming SOTA on two benchmarks with far fewer trainable parameters

Output Integrity Attack audio
1 citations PDF Code
defense arXiv Sep 24, 2025 · Sep 2025

ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection

Tai-Ming Huang, Wei-Tung Lin, Kai-Lung Hua et al. · National Taiwan University · Academia Sinica +3 more

Detects AI-generated images via MLLM step-by-step reasoning trained with GRPO reinforcement learning, achieving strong zero-shot generalization

Output Integrity Attack visionmultimodal
3 citations 1 influentialPDF
defense arXiv Sep 22, 2025 · Sep 2025

Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR

Masako Kishida · National Institute of Informatics

Extends SDP-based neural network verification with worst-case CVaR to certify safety under distributional input uncertainty and tail risk

Input Manipulation Attack
PDF
attack arXiv Jan 4, 2025 · Jan 2025

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

Chia-Yi Hsu, Yu-Lin Tsai, Yu Zhe et al. · National Yang Ming Chiao Tung University · University of Tsukuba +2 more

Backdoor attack on task vectors that persists across task learning, forgetting, and analogy arithmetic operations, evading all tested defenses

Model Poisoning Transfer Learning Attack visionnlpmultimodal
2 citations PDF