Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.48 KB

2501.05242.md

File metadata and controls

7 lines (5 loc) · 2.48 KB

Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and Photorealistic Mapping

3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods utilizing 3DGS have failed to provide high-quality novel view rendering for monocular, stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for RGB-D cameras but suffer significant degradation in rendering quality for monocular cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous localization and high-quality photorealistic mapping across monocular, stereo, and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D Gaussians to better model image appearance variations across different camera poses. Second, we introduce a frequency regularization pyramid to guide the distribution of Gaussians, allowing the model to effectively capture finer details in the scene. Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular cameras.

3D 高斯点绘制(3D Gaussian Splatting, 3DGS)近年来在同步定位与建图(Simultaneous Localization and Mapping, SLAM)中的新视图合成任务中取得了革命性进展。然而,现有利用 3DGS 的 SLAM 方法尚未能够同时为单目、立体视觉和 RGB-D 摄像机提供高质量的新视图渲染。其中,一些方法在 RGB-D 摄像机中表现较好,但在单目摄像机中渲染质量显著下降。 本文提出了 Scaffold-SLAM,一种能够在单目、立体视觉和 RGB-D 摄像机中同时实现高质量光真实感建图和定位的系统。为实现这一最先进的视觉质量,我们提出了基于运动的外观嵌入(Appearance-from-Motion embedding),使得 3D 高斯能够更好地建模不同相机姿态下图像的外观变化。此外,我们引入了频率正则化金字塔(Frequency Regularization Pyramid),用于引导高斯分布,从而有效捕捉场景中的细节信息。 在单目、立体视觉和 RGB-D 数据集上的大量实验表明,Scaffold-SLAM 在光真实感建图质量上显著优于当前最先进的方法。例如,在 TUM RGB-D 数据集的单目摄像机实验中,PSNR 提高了 16.76%。