Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 2.19 KB

2501.03229.md

File metadata and controls

7 lines (5 loc) · 2.19 KB

Gaussian Masked Autoencoders

This paper explores Masked Autoencoders (MAE) with Gaussian Splatting. While reconstructive self-supervised learning frameworks such as MAE learns good semantic abstractions, it is not trained for explicit spatial awareness. Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based representation and renders images via splatting. We show that GMAE can enable various zero-shot learning capabilities of spatial understanding (e.g., figure-ground segmentation, image layering, edge detection, etc.) while preserving the high-level semantics of self-supervised representation quality from MAE. To our knowledge, we are the first to employ Gaussian primitives in an image representation learning framework beyond optimization-based single-scene reconstructions. We believe GMAE will inspire further research in this direction and contribute to developing next-generation techniques for modeling high-fidelity visual data.

本文探讨了结合高斯点绘制(Gaussian Splatting)的掩码自动编码器(Masked Autoencoders, MAE)。尽管重构型自监督学习框架(如 MAE)能够学习良好的语义抽象,它们却未针对显式的空间感知进行训练。我们的方法被称为 高斯掩码自动编码器(Gaussian Masked Autoencoder, GMAE),旨在联合学习语义抽象和空间理解。 与 MAE 类似,GMAE 在像素空间中端到端地重构图像,但不同于 MAE,GMAE 引入了一个基于 3D 高斯的中间表示,并通过高斯点绘制渲染图像。实验表明,GMAE 在保留 MAE 自监督表示高层语义质量的同时,还能实现多种空间理解的零样本学习能力(如前景-背景分割、图像分层、边缘检测等)。 据我们所知,GMAE 是首个在图像表示学习框架中使用高斯原语(Gaussian Primitives)超越基于优化的单场景重建的研究工作。我们相信,GMAE 将激发更多在这一方向上的研究,并为开发高保真视觉数据建模的下一代技术提供新的思路。