Latest papers

15 papers
defense arXiv Mar 19, 2026 · 18d ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo et al. · Peking University · Zhongguancun Laboratory

Reveals label inference attacks in VFL succeed due to feature-label alignment, proposes zero-overhead cut layer defense

Model Inversion Attack federated-learning
PDF
survey arXiv Mar 13, 2026 · 24d ago

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu et al. · Beihang University · Zhongguancun Laboratory +1 more

Security analysis of OpenClaw autonomous agents revealing prompt injection RCE, tool chain attacks, and proposing FASA defense architecture

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal
PDF Code
attack arXiv Feb 24, 2026 · 5w ago

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Yige Liu, Yiwei Lou, Che Wang et al. · Peking University · Zhongguancun Laboratory

Triggerless backdoor attack in vertical federated learning that replaces embeddings at inference to hijack predictions without training-time poisoning

Model Poisoning federated-learning
PDF
attack arXiv Feb 18, 2026 · 6w ago

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng, Jiaqing Wu, Miao Chen et al. · Tsinghua University · Ant Group +1 more

Automated indirect prompt injection exploiting chat template tokens to hijack LLM agents, using Bayesian-optimized templates transferable to black-box commercial models

Prompt Injection nlp
1 citations PDF
defense arXiv Jan 15, 2026 · 11w ago

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li et al. · Beihang University · Peking University +1 more

Defends LLMs against jailbreaks via self-play RL where one model concurrently generates and resists adversarial prompts

Prompt Injection nlp
PDF
defense arXiv Jan 12, 2026 · 12w ago

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Xinyi Wu, Geng Hong, Yueyue Chen et al. · Fudan University · Zhongguancun Laboratory +2 more

Discovers social engineering attacks hijack LLM web agents via malicious webpage content; proposes runtime defense reducing attack success by 78%

Prompt Injection Excessive Agency nlp
1 citations PDF
attack arXiv Jan 9, 2026 · 12w ago

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp
PDF
defense arXiv Jan 5, 2026 · Jan 2026

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

Chenyu Hu, Qiming Hu, Sinan Chen et al. · Southwest University · University of Electronic Science and Technology of China +3 more

Defends federated learning against adaptive backdoor attacks using dynamic gradient scaling and robust core-set aggregation

Model Poisoning federated-learningvision
PDF
attack arXiv Dec 16, 2025 · Dec 2025

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Shuxin Zhao, Bo Lang, Nan Xiao et al. · Beihang University · Zhongguancun Laboratory

Backdoor attack on object detectors using inter-object spatial interaction patterns as triggers, enabling multi-trigger-multi-object attacks with 97%+ success in real-world scenes

Model Poisoning vision
PDF
attack arXiv Oct 27, 2025 · Oct 2025

Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction

Jin Hu, Jiakai Wang, Linna Jing et al. · Beihang University · Zhongguancun Laboratory +1 more

Generates transferable semantically constrained adversarial images from natural language instructions using diffusion models with uncertainty reduction

Input Manipulation Attack visionmultimodal
PDF
benchmark arXiv Oct 11, 2025 · Oct 2025

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying, Yangguang Shao, Jianle Gan et al. · Beihang University · Chinese Academy of Sciences +7 more

Benchmark evaluating LVLM web agent security across six attack vectors in realistic web environments, exposing universal vulnerabilities across 9 models

Prompt Injection Excessive Agency multimodalnlp
5 citations PDF
attack EMNLP Sep 25, 2025 · Sep 2025

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Wenkai Guo, Xuefeng Liu, Haolin Wang et al. · Beihang University · Zhongguancun Laboratory +3 more

Demonstrates training data extraction from federated LLM global models and proposes FL-specific attack tracking parameter updates across rounds

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning
PDF Code
attack arXiv Sep 20, 2025 · Sep 2025

Delving into Cryptanalytic Extraction of PReLU Neural Networks

Yi Chen, Xiaoyang Dong, Ruijie Ma et al. · Tsinghua University · Zhongguancun Laboratory +2 more

Cryptanalytic black-box attack recovers PReLU network parameters exactly, extending model extraction beyond ReLU to parametric activations

Model Theft vision
PDF
defense arXiv Aug 27, 2025 · Aug 2025

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks

Sheng Liu, Qiang Sheng, Danding Wang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Proactively synthesizes jailbreak-like training examples using embedding-space analysis to harden LLM safety alignment before attacks emerge

Prompt Injection nlp
PDF
defense in IEEE Transactions on Depend... Jan 9, 2025 · Jan 2025

TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Runhua Xu, Bo Li, Chao Li et al. · Beihang University · Zhongguancun Laboratory +2 more

Defends FL training data against gradient inference attacks using threshold functional encryption tolerating malicious aggregators

Model Inversion Attack federated-learning
PDF