Robust Privacy: Inference-Time Privacy through Certified Robustness

Machine learning systems can produce personalized outputs that allow an adversary to infer sensitive input attributes at inference time. We introduce Robust Privacy (RP), an inference-time privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-$R$ neighborhood around an input $x$ (e.g., under the $\ell_2$ norm), then $x$ enjoys $R$-Robust Privacy, i.e., observing the prediction cannot distinguish $x$ from any input within distance $R$ of $x$. We further develop Attribute Privacy Enhancement (APE) to translate input-level invariance into an attribute-level privacy effect. In a controlled recommendation task where the decision depends primarily on a sensitive attribute, we show that RP expands the set of sensitive-attribute values compatible with a positive recommendation, expanding the inference interval accordingly. Finally, we empirically demonstrate that RP also mitigates model inversion attacks (MIAs) by masking fine-grained input-output dependence. Even at small noise levels ($σ=0.1$), RP reduces the attack success rate (ASR) from 73% to 4% with partial model performance degradation. RP can also partially mitigate MIAs (e.g., ASR drops to 44%) with no model performance degradation.

Key Contributions

Introduces Robust Privacy (RP), an inference-time privacy notion that repurposes certified robustness: prediction invariance within radius R around input x guarantees x is indistinguishable from any input within that neighborhood.
Develops Attribute Privacy Enhancement (APE) to translate input-level invariance into attribute-level privacy, demonstrated on a controlled recommendation task to expand the inference interval for sensitive attributes.
Empirically shows RP mitigates model inversion attacks by masking input-output dependence, reducing ASR from 73% to 4% at σ=0.1, or to 44% with zero model performance degradation.

🛡️ Threat Analysis

Model Inversion Attack

The paper's primary threat model is a model inversion attack (MIA) where an adversary reconstructs or infers sensitive input attributes by observing model predictions. Robust Privacy (RP) is explicitly evaluated as a defense against MIAs, reducing ASR from 73% to 4%. This directly targets the ML03 threat of recovering private input data from model outputs.