This software project accompanies the research paper, FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction.
FineRecon is a deep learning model for 3D reconstruction from posed RGB images.
pip install \
matplotlib \
pillow \
numpy \
scikit-image \
scipy \
timm \
torch==1.13 \
torchvision \
"tqdm>=4.65" \
trimesh \
pytorch_lightning==1.8 \
pyyaml \
opencv-python-headless \
python-box \
tensorboard
cp example-config.yml config.yml
The paths in config.yml
will need to be edited to point to the data directories.
FineRecon requires an RGB-D scan dataset such as ScanNet, which can be downloaded and extracted using the scripts provided by the ScanNet authors.
The dataset structure expected by FineRecon is
/path/to/dataset/
test.txt
train.txt
val.txt
first_scan/
color/
0.jpg
1.jpg
2.jpg
...
depth/
0.png
1.png
2.png
...
intrinsic_color.txt
intrinsic_depth.txt
pose.npy
second_scan/
...
last_scan/
The files test.txt
, train.txt
, and val.txt
should each contain a newline-separated list of scan directory names (e.g. first_scan
) describing the test, train, and validation splits respectively. Each pose.npy
contains the camera poses (world-to-camera transformation matrices) as an array of size (N, 4, 4)
in npy format, where any invalid poses are marked with the value Inf
. The files intrinsic_color.txt
and intrinsic_depth.txt
should contain the (4, 4)
color and depth intrinsic matrices, respectively. In config.yml
, the value of dataset_dir
should be set to /path/to/dataset
.
To generate the ground truth TSDF run generate_gt_tsdf.py --dataset-dir /path/to/dataset --output-dir /path/to/gt_tsdf
, and in config.yml
set the value of tsdf_dir
to /path/to/gt_tsdf
.
To run training or inference with depth guidance, make sure depth_guidance.enabled
is set to True
in the config and set the value of depth_guidance.pred_depth_dir
to /path/to/pred_depth
, which should have the following structure:
/path/to/pred_depth/
first_scan/
depth/
0.png
1.png
2.png
...
intrinsic_depth.txt
second_scan/
...
last_scan/
It can be helpful to limit inference to only using a set of pre-defined keyframes, because it's faster (particulary with point back-projection enabled) and because depth estimates may not be available for all frames. To do this, set test_keyframes_file
in the config to the location of a JSON file with the following structure:
{
"first_scan": [i0, i1, i2, ...],
...
}
where i0
, i1
, etc. are the integer indices of the keyframes.
python main.py
We provide pre-trained weights here: checkpoint.zip. These are weights for our main model using resolution-agnostic TSDF supervision, depth guidance, and point-backprojection.
python main.py --task predict --ckpt path/to/checkpoint.ckpt
For convenience, we also provide the inference results (meshes) of our main model on the ScanNet test set:
- High-resolution [1 cm] (2.4 GB) → This is the resolution used in figures and metrics, unless otherwise stated.
- Low-resolution [4 cm] (148 MB)
Evaluation code and data for 3D metrics can be found in TransformerFusion, and evaluation code for 2D metrics can be found in Atlas.
@article{stier2023finerecon,
title={{FineRecon}: Depth-aware Feed-forward Network for Detailed 3D Reconstruction},
author={Stier, Noah and Ranjan, Anurag and Colburn, Alex and Yan, Yajie and Yang, Liang and Ma, Fangchang and Angles, Baptiste},
journal={arXiv preprint},
year={2023}
}