Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 2.81 KB

2501.00326.md

File metadata and controls

8 lines (6 loc) · 2.81 KB

OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies

Open-vocabulary scene understanding using 3D Gaussian (3DGS) representations has garnered considerable attention. However, existing methods mostly lift knowledge from large 2D vision models into 3DGS on a scene-by-scene basis, restricting the capabilities of open-vocabulary querying within their training scenes so that lacking the generalizability to novel scenes. In this work, we propose OVGaussian, a generalizable Open-Vocabulary 3D semantic segmentation framework based on the 3D Gaussian representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed SegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a 3D neural network to learn and predict the semantic property for each 3D Gaussian point, where the semantic property can be rendered as multi-view consistent 2D semantic maps. In the next, we propose a Cross-modal Consistency Learning (CCL) framework that utilizes open-vocabulary annotations of 2D images and 3D Gaussians within SegGaussian to train the 3D neural network capable of open-vocabulary semantic segmentation across Gaussian-based 3D scenes. Experimental results demonstrate that OVGaussian significantly outperforms baseline methods, exhibiting robust cross-scene, cross-domain, and novel-view generalization capabilities.

基于 3D 高斯表示(3DGS)的开放词汇场景理解引起了广泛关注。然而,现有方法主要依赖将大规模二维视觉模型的知识逐场景地迁移到 3DGS 中,这限制了开放词汇查询的能力,仅能在其训练场景内操作,缺乏对新场景的泛化能力。为了解决这一问题,我们提出了 OVGaussian,一种基于 3D 高斯表示的通用开放词汇三维语义分割框架。 我们首先基于 3DGS 构建了一个大规模 3D 场景数据集,称为 SegGaussian,该数据集为高斯点和多视角图像提供了详细的语义和实例标注。为促进语义在不同场景间的泛化,我们提出了 通用语义栅格化(Generalizable Semantic Rasterization, GSR),利用 3D 神经网络学习并预测每个 3D 高斯点的语义属性,将语义属性渲染为多视角一致的二维语义图。 接下来,我们提出了一种 跨模态一致性学习(Cross-modal Consistency Learning, CCL)框架,通过利用 SegGaussian 数据集中二维图像和 3D 高斯的开放词汇标注,训练能够在基于高斯的三维场景中进行开放词汇语义分割的 3D 神经网络。 实验结果表明,OVGaussian 在跨场景、跨领域以及新视图泛化能力方面显著优于基线方法,展现出强大的鲁棒性和泛化能力。