defense arXiv Nov 17, 2025 · Nov 2025
Ruijun Deng, Zhihui Lu, Qiang Duan · Fudan University · Pennsylvania State University
Defends split inference against data reconstruction attacks by decomposing redundant smashed-data information before injecting calibrated privacy noise
Model Inversion Attack vision
Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can recover the original inputs from the smashed data sent from the client to the server, leading to significant privacy leakage. While various defenses have been proposed, they often result in substantial utility degradation, particularly when the client-side model is shallow. We identify a key cause of this trade-off: existing defenses apply excessive perturbation to redundant information in the smashed data. To address this issue in computer vision tasks, we propose InfoDecom, a defense framework that first decomposes and removes redundant information and then injects noise calibrated to provide theoretically guaranteed privacy. Experiments demonstrate that InfoDecom achieves a superior utility-privacy trade-off compared to existing baselines.
cnn Fudan University · Pennsylvania State University
defense arXiv Dec 18, 2025 · Dec 2025
Jirui Yang, Hengqi Guo, Zhihui Lu et al. · Fudan University · Ant Group +1 more
Defends LLMs against harmful prompts by comparing refusal vs. agreement prefix log-probabilities with near-zero inference overhead
Prompt Injection nlp
Large language models often face a three-way trade-off among detection accuracy, inference latency, and deployment cost when used in real-world safety-sensitive applications. This paper introduces Prefix Probing, a black-box harmful content detection method that compares the conditional log-probabilities of "agreement/execution" versus "refusal/safety" opening prefixes and leverages prefix caching to reduce detection overhead to near first-token latency. During inference, the method requires only a single log-probability computation over the probe prefixes to produce a harmfulness score and apply a threshold, without invoking any additional models or multi-stage inference. To further enhance the discriminative power of the prefixes, we design an efficient prefix construction algorithm that automatically discovers highly informative prefixes, substantially improving detection performance. Extensive experiments demonstrate that Prefix Probing achieves detection effectiveness comparable to mainstream external safety models while incurring only minimal computational cost and requiring no extra model deployment, highlighting its strong practicality and efficiency.
llm transformer Fudan University · Ant Group · Pennsylvania State University