Latest papers

30 papers
defense arXiv Apr 29, 2026 · 22d ago

Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints

Wasim Ahmad, Wei Zhang, Xuerui Mao · Beijing Institute of Technology

Multimodal deepfake detector that jointly learns attribution and detection by aligning generator-specific fingerprints across audio and video

Output Integrity Attack multimodalvisionaudio
PDF
attack arXiv Apr 26, 2026 · 25d ago

Do Protective Perturbations Really Protect Portrait Privacy under Real-world Image Transformations?

Ruiqing Sun, Xingshan Yao, Zhijing Wu et al. · Beijing Institute of Technology

Attacks pixel-level portrait privacy protections by purifying adversarial perturbations through real-world image transformations like scaling and compression

Output Integrity Attack visiongenerative
PDF
attack arXiv Apr 26, 2026 · 25d ago

Spore: Efficient and Training-Free Privacy Extraction Attack on LLMs via Inference-Time Hybrid Probing

Yu Cui, Ruiqing Yue, Hang Fu et al. · Beijing Institute of Technology · Chinese Academy of Sciences +3 more

Extracts private information from LLM agent memory via single-query hybrid probing in black-box and gray-box settings

Model Inversion Attack Sensitive Information Disclosure nlp
PDF
defense arXiv Apr 14, 2026 · 5w ago

Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis

Miao Liu, Fangda Wei, Jing Wang et al. · Beijing Institute of Technology · University of Science and Technology Beijing

Detects deepfakes in listening scenarios using motion analysis and audio-guided fusion, outperforming speaking-focused detectors

Output Integrity Attack multimodalvisionaudio
PDF Code
defense arXiv Apr 9, 2026 · 6w ago

MSCT: Differential Cross-Modal Attention for Deepfake Detection

Fangda Wei, Miao Liu, Yingxue Wang et al. · Beijing Institute of Technology · China Academy of Electronics and Information Technology

Transformer-based deepfake detector using multi-scale temporal features and differential cross-modal attention to identify audio-visual inconsistencies

Output Integrity Attack multimodalaudiovision
PDF
benchmark arXiv Apr 9, 2026 · 6w ago

AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan

Yuankun Xie, Haonan Cheng, Jiayi Zhou et al. · Communication University of China · Ant Group +3 more

Benchmark challenge for detecting AI-generated speech, sound, singing, and music across diverse generation methods and real-world conditions

Output Integrity Attack audiomultimodalnlp
PDF
attack arXiv Mar 13, 2026 · 9w ago

CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

Shuhan Xu, Siyuan Liang, Hongling Zheng et al. · Wuhan University · Nanyang Technological University +1 more

Adversarial attack on diffusion I2V models that disrupts temporal consistency via low-dimensional velocity field perturbations

Input Manipulation Attack visiongenerative
PDF
benchmark arXiv Mar 9, 2026 · 10w ago

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

Yonghong Deng, Zhen Yang, Ping Jian et al. · Beijing Institute of Technology

Mechanistic analysis reveals LLM jailbreaks arise from competition between safety-aligned attention heads and intrinsic continuation-drive heads

Prompt Injection nlp
PDF
defense arXiv Mar 6, 2026 · 10w ago

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

Feiran Li, Qianqian Xu, Shilong Bao et al. · Institute of Information Engineering · University of Chinese Academy of Sciences +4 more

Black-box backdoor detector for text-to-image diffusion models using semantic instruction-response deviation across varied prompts

Model Poisoning visiongenerativemultimodal
PDF Code
defense arXiv Mar 3, 2026 · 11w ago

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

Yuhang Li, Yajie Wang, Xiangyun Tang et al. · Beijing Institute of Technology · Minzu University of China

Defends federated learning against Byzantine poisoning and shuffler tampering under Shuffle-DP with verifiable secret-shared aggregation

Data Poisoning Attack federated-learning
PDF
defense arXiv Feb 5, 2026 · Feb 2026

ALIEN: Analytic Latent Watermarking for Controllable Generation

Liangqi Lei, Keke Gai, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Embeds analytically-derived watermarks in diffusion model latents for content provenance with improved quality and attack robustness

Output Integrity Attack visiongenerative
PDF Code
attack arXiv Jan 9, 2026 · Jan 2026

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Dec 31, 2025 · Dec 2025

Towards Provably Secure Generative AI: Reliable Consensus Sampling

Yu Cui, Hang Fu, Sicheng Pan et al. · Beijing Institute of Technology · Tsinghua University

Provably secure consensus sampling algorithm for LLM groups that tolerates Byzantine adversarial models and eliminates unsafe output abstention

Prompt Injection nlpgenerative
PDF
defense arXiv Dec 24, 2025 · Dec 2025

Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face

Rui-qing Sun, Xingshan Yao, Tian Lan et al. · Beijing Institute of Technology

Defends portrait videos against 3D deepfake generation by injecting adversarial perturbations that disrupt 3D geometry acquisition with 47x speedup

Output Integrity Attack vision
PDF Code
attack arXiv Dec 3, 2025 · Dec 2025

Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems

Ruichao Liang, Le Yin, Jing Chen et al. · Wuhan University · Nanyang Technological University +1 more

Topology-aware multi-hop indirect injection attack chains through LLM multi-agent systems to reach high-value targets, achieving 40–78% success rate

Prompt Injection Excessive Agency nlp
PDF
benchmark arXiv Nov 24, 2025 · Nov 2025

Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion

Yu Cui, Yifei Liu, Hang Fu et al. · Beijing Institute of Technology · Tsinghua University

Benchmarks existential safety risks in LLMs via prefix completion jailbreaks, including dangerous autonomous tool-calling behavior

Prompt Injection Excessive Agency nlpmultimodal
1 citations PDF Code
attack arXiv Nov 20, 2025 · Nov 2025

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang, Lichao Wang, Jianping Zhang et al. · The Chinese University of Hong Kong · Beijing Institute of Technology +1 more

Adversarial image attack jailbreaks GPT-4o, Gemini-Pro, and Llama-4 by hiding harmful instructions inside competing visual objectives, transferring across VLMs

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF Code
defense arXiv Nov 4, 2025 · Nov 2025

Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

Lihan Xu, Yanjie Dong, Gang Wang et al. · Shenzhen MSU-BIT University · Beijing Institute of Technology

Defends federated learning from Byzantine adversaries by combining Nesterov momentum with robust aggregation for faster convergent training

Data Poisoning Attack federated-learning
1 citations PDF
attack Mathematics Oct 29, 2025 · Oct 2025

Bilevel Models for Adversarial Learning and A Case Study

Yutong Zheng, Qingna Li · Beijing Institute of Technology

Proposes bilevel optimization models to design adversarial perturbations that break convex clustering via perturbation analysis and δ-measure deviation

Input Manipulation Attack tabular
PDF
attack arXiv Oct 16, 2025 · Oct 2025

A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems

Zixuan Liu, Yi Zhao, Zhuotao Liu et al. · Tsinghua University · Zhongguancun Lab +1 more

RL-based hard-label black-box attack crafts adversarial traffic mimicking benign patterns to evade ML-based network intrusion detectors

Input Manipulation Attack timeseries
PDF
Loading more papers…