ML Security Papers

Stats

Latest papers

15 papers

defense arXiv Mar 19, 2026 · 18d ago

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo et al. · Peking University · Zhongguancun Laboratory

Reveals label inference attacks in VFL succeed due to feature-label alignment, proposes zero-overhead cut layer defense

Model Inversion Attack federated-learning

PDF

survey arXiv Mar 13, 2026 · 24d ago

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Zonghao Ying, Xiao Yang, Siyang Wu et al. · Beihang University · Zhongguancun Laboratory +1 more

Security analysis of OpenClaw autonomous agents revealing prompt injection RCE, tool chain attacks, and proposing FASA defense architecture

AI Supply Chain Attacks Prompt Injection Insecure Plugin Design Excessive Agency nlpmultimodal

PDF Code

attack arXiv Feb 24, 2026 · 5w ago

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Yige Liu, Yiwei Lou, Che Wang et al. · Peking University · Zhongguancun Laboratory

Triggerless backdoor attack in vertical federated learning that replaces embeddings at inference to hijack predictions without training-time poisoning

Model Poisoning federated-learning

PDF

attack arXiv Feb 18, 2026 · 6w ago

Automating Agent Hijacking via Structural Template Injection

Xinhao Deng, Jiaqing Wu, Miao Chen et al. · Tsinghua University · Ant Group +1 more

Automated indirect prompt injection exploiting chat template tokens to hijack LLM agents, using Bayesian-optimized templates transferable to black-box commercial models

Prompt Injection nlp

1 citations PDF

defense arXiv Jan 15, 2026 · 11w ago

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Hao Wang, Yanting Wang, Hao Li et al. · Beihang University · Peking University +1 more

Defends LLMs against jailbreaks via self-play RL where one model concurrently generates and resists adversarial prompts

Prompt Injection nlp

PDF

defense arXiv Jan 12, 2026 · 12w ago

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Xinyi Wu, Geng Hong, Yueyue Chen et al. · Fudan University · Zhongguancun Laboratory +2 more

Discovers social engineering attacks hijack LLM web agents via malicious webpage content; proposes runtime defense reducing attack success by 78%

Prompt Injection Excessive Agency nlp

1 citations PDF

attack arXiv Jan 9, 2026 · 12w ago

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Zhaoqi Wang, Zijian Zhang, Daqing He et al. · Beijing Institute of Technology · University of Auckland +2 more

Jailbreaks aligned LLMs by disguising malicious queries as tool calls and using RL to iteratively escalate response harmfulness across turns

Prompt Injection Insecure Plugin Design nlp

PDF

defense arXiv Jan 5, 2026 · Jan 2026

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

Chenyu Hu, Qiming Hu, Sinan Chen et al. · Southwest University · University of Electronic Science and Technology of China +3 more

Defends federated learning against adaptive backdoor attacks using dynamic gradient scaling and robust core-set aggregation

Model Poisoning federated-learningvision

PDF

attack arXiv Dec 16, 2025 · Dec 2025

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Shuxin Zhao, Bo Lang, Nan Xiao et al. · Beihang University · Zhongguancun Laboratory

Backdoor attack on object detectors using inter-object spatial interaction patterns as triggers, enabling multi-trigger-multi-object attacks with 97%+ success in real-world scenes

Model Poisoning vision

PDF

attack arXiv Oct 27, 2025 · Oct 2025

Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction

Jin Hu, Jiakai Wang, Linna Jing et al. · Beihang University · Zhongguancun Laboratory +1 more

Generates transferable semantically constrained adversarial images from natural language instructions using diffusion models with uncertainty reduction

Input Manipulation Attack visionmultimodal

PDF

Recently, semantically constrained adversarial examples (SemanticAE), which are directly generated from natural language instructions, have become a promising avenue for future research due to their flexible attacking forms. To generate SemanticAEs, current methods fall short of satisfactory attacking ability as the key underlying factors of semantic uncertainty in human instructions, such as referring diversity, descriptive incompleteness, and boundary ambiguity, have not been fully investigated. To tackle the issues, this paper develops a multi-dimensional instruction uncertainty reduction (InSUR) framework to generate more satisfactory SemanticAE, i.e., transferable, adaptive, and effective. Specifically, in the dimension of the sampling method, we propose the residual-driven attacking direction stabilization to alleviate the unstable adversarial optimization caused by the diversity of language references. By coarsely predicting the language-guided sampling process, the optimization process will be stabilized by the designed ResAdv-DDIM sampler, therefore releasing the transferable and robust adversarial capability of multi-step diffusion models. In task modeling, we propose the context-encoded attacking scenario constraint to supplement the missing knowledge from incomplete human instructions. Guidance masking and renderer integration are proposed to regulate the constraints of 2D/3D SemanticAE, activating stronger scenario-adapted attacks. Moreover, in the dimension of generator evaluation, we propose the semantic-abstracted attacking evaluation enhancement by clarifying the evaluation boundary, facilitating the development of more effective SemanticAE generators. Extensive experiments demonstrate the superiority of the transfer attack performance of InSUR. Moreover, we realize the reference-free generation of semantically constrained 3D adversarial examples for the first time.

diffusion cnn transformer Beihang University · Zhongguancun Laboratory · ETH Zürich

PDF arXiv DOI

benchmark arXiv Oct 11, 2025 · Oct 2025

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying, Yangguang Shao, Jianle Gan et al. · Beihang University · Chinese Academy of Sciences +7 more

Benchmark evaluating LVLM web agent security across six attack vectors in realistic web environments, exposing universal vulnerabilities across 9 models

Prompt Injection Excessive Agency multimodalnlp

5 citations PDF

attack EMNLP Sep 25, 2025 · Sep 2025

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Wenkai Guo, Xuefeng Liu, Haolin Wang et al. · Beihang University · Zhongguancun Laboratory +3 more

Demonstrates training data extraction from federated LLM global models and proposes FL-specific attack tracking parameter updates across rounds

Model Inversion Attack Sensitive Information Disclosure nlpfederated-learning

PDF Code

attack arXiv Sep 20, 2025 · Sep 2025

Delving into Cryptanalytic Extraction of PReLU Neural Networks

Yi Chen, Xiaoyang Dong, Ruijie Ma et al. · Tsinghua University · Zhongguancun Laboratory +2 more

Cryptanalytic black-box attack recovers PReLU network parameters exactly, extending model extraction beyond ReLU to parametric activations

Model Theft vision

PDF

defense arXiv Aug 27, 2025 · Aug 2025

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks

Sheng Liu, Qiang Sheng, Danding Wang et al. · Chinese Academy of Sciences · University of Chinese Academy of Sciences +1 more

Proactively synthesizes jailbreak-like training examples using embedding-space analysis to harden LLM safety alignment before attacks emerge

Prompt Injection nlp

PDF

defense in IEEE Transactions on Depend... Jan 9, 2025 · Jan 2025

TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Runhua Xu, Bo Li, Chao Li et al. · Beihang University · Zhongguancun Laboratory +2 more

Defends FL training data against gradient inference attacks using threshold functional encryption tolerating malicious aggregators

Model Inversion Attack federated-learning

PDF

Latest papers

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Automating Agent Hijacking via Structural Template Injection

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks

CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World

Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation

Delving into Cryptanalytic Extraction of PReLU Neural Networks

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks

TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Filters

Time Period

Paper Type

OWASP ML Top 10

OWASP LLM Top 10

Institution

Venue