Can we make NeRF-based visual localization privacy-preserving?

Visual localization (VL) is the task of estimating the camera pose in a known scene. VL methods, a.o., can be distinguished based on how they represent the scene, e.g., explicitly through a (sparse) point cloud or a collection of images or implicitly through the weights of a neural network. Recently, NeRF-based methods have become popular for VL. While NeRFs offer high-quality novel view synthesis, they inadvertently encode fine scene details, raising privacy concerns when deployed in cloud-based localization services as sensitive information could be recovered. In this paper, we tackle this challenge on two ends. We first propose a new protocol to assess privacy-preservation of NeRF-based representations. We show that NeRFs trained with photometric losses store fine-grained details in their geometry representations, making them vulnerable to privacy attacks, even if the head that predicts colors is removed. Second, we propose ppNeSF (Privacy-Preserving Neural Segmentation Field), a NeRF variant trained with segmentation supervision instead of RGB images. These segmentation labels are learned in a self-supervised manner, ensuring they are coarse enough to obscure identifiable scene details while remaining discriminativeness in 3D. The segmentation space of ppNeSF can be used for accurate visual localization, yielding state-of-the-art results.

Key Contributions

Inversion attack demonstrating that NeRFs trained with photometric loss expose private scene texture details through geometry representations, even after the color-prediction head is removed
VLM-based privacy evaluation protocol using LLaVa to assess fine-grained scene detail recoverability — more comprehensive than closed-set object detectors
ppNeSF: a privacy-preserving NeRF variant trained with self-supervised segmentation labels that achieves state-of-the-art visual localization without encoding identifiable scene appearance

🛡️ Threat Analysis

Model Inversion Attack

The primary attack inverts rendered internal representations of trained NeRFs to reconstruct private training-time scene images — an adversary recovers identifiable scene details from model internals, even after removing the color prediction head. ppNeSF is explicitly designed and evaluated as a defense against this data reconstruction threat.

Details

Domains

vision

Model Types

vlm

Threat Tags

white_boxinference_time

Datasets

7-Scenesmip-NeRF 360

Applications

2026 0 cit.

Model Inversion Attack

58%