Robust Privacy: Inference-Time Privacy through Certified Robustness
Jiankai Jin , Xiangzheng Zhang , Zhao Liu , Deyue Zhang , Quanchen Zou
Published on arXiv
2601.17360
Model Inversion Attack
OWASP ML Top 10 — ML03
Key Finding
RP reduces model inversion attack success rate from 73% to 4% at σ=0.1, and achieves a 44% ASR reduction with no model performance degradation.
Robust Privacy (RP) / Attribute Privacy Enhancement (APE)
Novel technique introduced
Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($σ=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.
Key Contributions
- Introduces Robust Privacy (RP), an inference-time privacy notion that repurposes certified robustness: prediction invariance within radius R around input x guarantees x is indistinguishable from any input within that neighborhood.
- Develops Attribute Privacy Enhancement (APE) to translate input-level invariance into attribute-level privacy, demonstrated on a controlled recommendation task to expand the inference interval for sensitive attributes.
- Empirically shows RP mitigates model inversion attacks by masking input-output dependence, reducing ASR from 73% to 4% at σ=0.1, or to 44% with zero model performance degradation.
🛡️ Threat Analysis
The paper's primary threat model is a model inversion attack (MIA) where an adversary reconstructs or infers sensitive input attributes by observing model predictions. Robust Privacy (RP) is explicitly evaluated as a defense against MIAs, reducing ASR from 73% to 4%. This directly targets the ML03 threat of recovering private input data from model outputs.