SLGaussian: Fast Language Gaussian Splatting in Sparse Views

3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring consistent SAM segmentations through video tracking and using low-dimensional indexing for high-dimensional CLIP features, SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions. In experiments on two-view sparse 3D object querying and segmentation in the LERF and 3D-OVS datasets, SLGaussian outperforms existing methods in chosen IoU, Localization Accuracy, and mIoU. Moreover, our model achieves scene inference in under 30 seconds and open-vocabulary querying in just 0.011 seconds per query.

3D语义场学习在自动驾驶、增强/虚拟现实（AR/VR）和机器人等领域至关重要，因为这些应用需要从有限视角中准确理解3D场景。然而，现有方法在稀疏视图条件下表现不佳，依赖于效率低下的逐场景多视图优化，这在许多实际任务中并不实用。为了解决这一问题，我们提出了 SLGaussian，一种用于从稀疏视角构建3D语义场的前馈方法，实现对基于3D高斯投影（3DGS）场景的直接推理。通过视频跟踪确保一致的 SAM（Segment Anything Model）分割，以及使用低维索引高维 CLIP 特征，SLGaussian 能高效地在3D空间中嵌入语言信息，从而在稀疏视图条件下提供稳健的3D场景理解解决方案。在 LERF 和 3D-OVS 数据集上的双视图稀疏3D对象查询与分割实验中，SLGaussian 在选择的 IoU、定位准确率（Localization Accuracy）和 mIoU 指标上均优于现有方法。此外，我们的模型在场景推理中实现了小于30秒的推理时间，并能以每次查询仅 0.011 秒的速度完成开放词汇查询，展现了高效性和实用性。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2412.08331.md

2412.08331.md

SLGaussian: Fast Language Gaussian Splatting in Sparse Views

Files

2412.08331.md

Latest commit

History

2412.08331.md

File metadata and controls

SLGaussian: Fast Language Gaussian Splatting in Sparse Views