Latest papers

24 papers
attack arXiv Mar 13, 2026 · 24d ago

CtrlAttack: A Unified Attack on World-Model Control in Diffusion Models

Shuhan Xu, Siyuan Liang, Hongling Zheng et al. · Wuhan University · Nanyang Technological University +1 more

Adversarial attack on diffusion I2V models that disrupts temporal consistency via low-dimensional velocity field perturbations

Input Manipulation Attack visiongenerative
PDF
benchmark arXiv Mar 9, 2026 · 28d ago

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

Yonghong Deng, Zhen Yang, Ping Jian et al. · Beijing Institute of Technology

Mechanistic analysis reveals LLM jailbreaks arise from competition between safety-aligned attention heads and intrinsic continuation-drive heads

Prompt Injection nlp
PDF
defense arXiv Mar 6, 2026 · 4w ago

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

Feiran Li, Qianqian Xu, Shilong Bao et al. · Institute of Information Engineering · University of Chinese Academy of Sciences +4 more

Black-box backdoor detector for text-to-image diffusion models using semantic instruction-response deviation across varied prompts

Model Poisoning visiongenerativemultimodal
PDF Code
defense arXiv Mar 3, 2026 · 4w ago

RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy

Yuhang Li, Yajie Wang, Xiangyun Tang et al. · Beijing Institute of Technology · Minzu University of China

Defends federated learning against Byzantine poisoning and shuffler tampering under Shuffle-DP with verifiable secret-shared aggregation

Data Poisoning Attack federated-learning
PDF
defense arXiv Feb 5, 2026 · 8w ago

ALIEN: Analytic Latent Watermarking for Controllable Generation

Liangqi Lei, Keke Gai, Jing Yu et al. · Beijing Institute of Technology · Minzu University of China +1 more

Embeds analytically-derived watermarks in diffusion model latents for content provenance with improved quality and attack robustness

Output Integrity Attack visiongenerative
PDF Code
attack arXiv Jan 9, 2026 · 12w ago

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Dec 31, 2025 · Dec 2025

Towards Provably Secure Generative AI: Reliable Consensus Sampling

Yu Cui, Hang Fu, Sicheng Pan et al. · Beijing Institute of Technology · Tsinghua University

Provably secure consensus sampling algorithm for LLM groups that tolerates Byzantine adversarial models and eliminates unsafe output abstention

Prompt Injection nlpgenerative
PDF
defense arXiv Dec 24, 2025 · Dec 2025

Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face

Rui-qing Sun, Xingshan Yao, Tian Lan et al. · Beijing Institute of Technology

Defends portrait videos against 3D deepfake generation by injecting adversarial perturbations that disrupt 3D geometry acquisition with 47x speedup

Output Integrity Attack vision
PDF Code
attack arXiv Dec 3, 2025 · Dec 2025

Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems

Ruichao Liang, Le Yin, Jing Chen et al. · Wuhan University · Nanyang Technological University +1 more

Topology-aware multi-hop indirect injection attack chains through LLM multi-agent systems to reach high-value targets, achieving 40–78% success rate

Prompt Injection Excessive Agency nlp
PDF
benchmark arXiv Nov 24, 2025 · Nov 2025

Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion

Yu Cui, Yifei Liu, Hang Fu et al. · Beijing Institute of Technology · Tsinghua University

Benchmarks existential safety risks in LLMs via prefix completion jailbreaks, including dangerous autonomous tool-calling behavior

Prompt Injection Excessive Agency nlpmultimodal
1 citations PDF Code
attack arXiv Nov 20, 2025 · Nov 2025

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang, Lichao Wang, Jianping Zhang et al. · The Chinese University of Hong Kong · Beijing Institute of Technology +1 more

Adversarial image attack jailbreaks GPT-4o, Gemini-Pro, and Llama-4 by hiding harmful instructions inside competing visual objectives, transferring across VLMs

Input Manipulation Attack Prompt Injection visionmultimodalnlp
PDF Code
defense arXiv Nov 4, 2025 · Nov 2025

Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

Lihan Xu, Yanjie Dong, Gang Wang et al. · Shenzhen MSU-BIT University · Beijing Institute of Technology

Defends federated learning from Byzantine adversaries by combining Nesterov momentum with robust aggregation for faster convergent training

Data Poisoning Attack federated-learning
1 citations PDF
attack Mathematics Oct 29, 2025 · Oct 2025

Bilevel Models for Adversarial Learning and A Case Study

Yutong Zheng, Qingna Li · Beijing Institute of Technology

Proposes bilevel optimization models to design adversarial perturbations that break convex clustering via perturbation analysis and δ-measure deviation

Input Manipulation Attack tabular
PDF
attack arXiv Oct 16, 2025 · Oct 2025

A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems

Zixuan Liu, Yi Zhao, Zhuotao Liu et al. · Tsinghua University · Zhongguancun Lab +1 more

RL-based hard-label black-box attack crafts adversarial traffic mimicking benign patterns to evade ML-based network intrusion detectors

Input Manipulation Attack timeseries
PDF
attack arXiv Oct 14, 2025 · Oct 2025

Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs

Bowen Fan, Zhilin Guo, Xunkai Li et al. · Beijing Institute of Technology · Shandong University +1 more

Adversarial attack on Graph-LLMs jointly perturbing graph topology and text node features to expose multi-dimensional vulnerabilities

Input Manipulation Attack graphnlp
PDF Code
attack arXiv Oct 5, 2025 · Oct 2025

VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy

Yu Cui, Sicheng Pan, Yifei Liu et al. · Beijing Institute of Technology · Tsinghua University

Indirect prompt injection attack manipulates LLM-integrated apps to solicit user PII in batches under black-box settings

Prompt Injection nlp
3 citations PDF
defense arXiv Sep 29, 2025 · Sep 2025

H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning

Shiyuan Zuo, Rongfei Fan, Cheng Zhan et al. · Beijing Institute of Technology · Sun Yat-Sen University +2 more

Defends federated learning against Byzantine poisoning via efficient random-segment similarity-aware aggregation, with or without clean data

Data Poisoning Attack federated-learning
PDF
attack arXiv Sep 28, 2025 · Sep 2025

Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning

Zhaoqi Wang, Daqing He, Zijian Zhang et al. · Beijing Institute of Technology · Hefei University of Technology +1 more

Attacks LLM alignment with RL-driven formalization of jailbreak prompts combined with GraphRAG knowledge reuse

Prompt Injection nlp
PDF
defense arXiv Sep 16, 2025 · Sep 2025

EByFTVeS: Efficient Byzantine Fault Tolerant-based Verifiable Secret-sharing in Distributed Privacy-preserving Machine Learning

Zhen Li, Zijian Zhang, Wenjin Yang et al. · Beijing Institute of Technology · The University of Auckland

Proposes a timing-based Byzantine model poisoning attack on BFT-VSS distributed ML and defends with consensus-synchronized secret sharing

Data Poisoning Attack federated-learning
PDF
benchmark arXiv Sep 9, 2025 · Sep 2025

SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

Hongfei Xia, Hongru Wang, Zeming Liu et al. · Beijing Institute of Technology · The Chinese University of Hong Kong +2 more

Proposes benchmark and safety framework for prospective LLM tool-call risk assessment before irreversible harmful actions execute

Insecure Plugin Design Excessive Agency nlp
PDF Code
Loading more papers…