Photo-realistic scene reconstruction from sparse-view, uncalibrated images is highly required in practice. Although some successes have been made, existing methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic and extrinsic), or SfM-free but need densely captured images. To combine the advantages of both methods while addressing their respective weaknesses, we propose Dust to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize 3DGS and image poses simultaneously from sparse and uncalibrated images. Our key idea is to first construct a coarse model efficiently and subsequently refine it using warped and inpainted images at novel viewpoints. To do this, we first introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model. Then, a Warped Image-Guided Inpainting (WIGI) module is proposed to warp the training images to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill the "holes" in the warped images caused by view-direction changes, providing high-quality supervision to further optimize the 3D model and the camera poses. Extensive experiments and ablation studies demonstrate the validity of D2T and its design choices, achieving state-of-the-art performance in both tasks of novel view synthesis and pose estimation while keeping high efficiency.
从稀疏视角、未校准的图像中进行照片级真实感的场景重建在实践中具有重要需求。尽管已有方法取得了一定成功,但现有方法要么是针对稀疏视角但需要精确的相机参数(即内参和外参),要么是无需结构化运动(SfM)但需要密集采集的图像。为结合两种方法的优势并克服各自的局限性,我们提出了Dust to Tower (D2T),一种准确高效的从粗到精框架,用于从稀疏未校准图像中同时优化3DGS(3D Gaussian Splatting)和图像的相机位姿。 我们的核心思路是,先高效地构建粗略模型,然后利用在新视角下生成和修复的图像对其进行细化。具体来说,我们首先引入粗略构建模块(Coarse Construction Module, CCM),利用快速的多视图立体模型(MVS)初始化3D高斯散射(3DGS),并恢复初始的相机位姿。为了在新视角下优化3D模型,我们提出了置信度感知深度对齐模块(Confidence Aware Depth Alignment, CADA),通过将粗略深度图的高置信度部分与单目深度模型估计的深度对齐来细化深度图。随后,我们设计了基于图像变换的修复模块(Warped Image-Guided Inpainting, WIGI),通过细化后的深度图将训练图像变换到新视角,并通过修复技术填补因视角变化导致的图像“空洞”,提供高质量的监督信息以进一步优化3D模型和相机位姿。 大量实验和消融研究验证了D2T及其设计选择的有效性,在新视角合成和位姿估计任务中均实现了最先进的性能,同时保持了较高的效率。