defense arXiv Feb 10, 2026 · 7w ago
Kun Wang, Zherui Li, Zhenhong Zhou et al. · Nanyang Technological University · Beijing University of Posts and Telecommunications +4 more
Exposes cross-modal jailbreak vulnerabilities in omni-modal LLMs and defends via SVD-guided refusal vector amplification with lightweight adapters
Prompt Injection multimodalnlp
Omni-modal Large Language Models (OLLMs) greatly expand LLMs' multimodal capabilities but also introduce cross-modal safety risks. However, a systematic understanding of vulnerabilities in omni-modal interactions remains lacking. To bridge this gap, we establish a modality-semantics decoupling principle and construct the AdvBench-Omni dataset, which reveals a significant vulnerability in OLLMs. Mechanistic analysis uncovers a Mid-layer Dissolution phenomenon driven by refusal vector magnitude shrinkage, alongside the existence of a modal-invariant pure refusal direction. Inspired by these insights, we extract a golden refusal vector using Singular Value Decomposition and propose OmniSteer, which utilizes lightweight adapters to modulate intervention intensity adaptively. Extensive experiments show that our method not only increases the Refusal Success Rate against harmful inputs from 69.9% to 91.2%, but also effectively preserves the general capabilities across all modalities. Our code is available at: https://github.com/zhrli324/omni-safety-research.
llm vlm multimodal Nanyang Technological University · Beijing University of Posts and Telecommunications · Tsinghua University +3 more
attack arXiv Feb 9, 2026 · 8w ago
Ziwei Wang, Yuanhe Zhang, Jing Chen et al. · Wuhan University · Beijing University of Posts and Telecommunications +3 more
Crafts counterfactual prompts using Recursive Entropy to force LRMs into infinite thinking loops, reducing throughput by 90%
Model Denial of Service nlp
Large Reasoning Models (LRMs) employ reasoning to address complex tasks. Such explicit reasoning requires extended context lengths, resulting in substantially higher resource consumption. Prior work has shown that adversarially crafted inputs can trigger redundant reasoning processes, exposing LRMs to resource-exhaustion vulnerabilities. However, the reasoning process itself, especially its reflective component, has received limited attention, even though it can lead to over-reflection and consume excessive computing power. In this paper, we introduce Recursive Entropy to quantify the risk of resource consumption in reflection, thereby revealing the safety issues inherent in inference itself. Based on Recursive Entropy, we introduce RECUR, a resource exhaustion attack via Recursive Entropy guided Counterfactual Utilization and Reflection. It constructs counterfactual questions to verify the inherent flaws and risks of LRMs. Extensive experiments demonstrate that, under benign inference, recursive entropy exhibits a pronounced decreasing trend. RECUR disrupts this trend, increasing the output length by up to 11x and decreasing throughput by 90%. Our work provides a new perspective on robust reasoning.
llm Wuhan University · Beijing University of Posts and Telecommunications · Nanyang Technological University +2 more