This is a working collection of a series of pipelines for working with image and point cloud data derived from Google Street View.
- util: data conversions, labelling, image resizing, and other utilities.
- collect-street-view: download panoramas and their corresponding depth maps from Google Street View.
- find-site-panos: find all Google Street View panoramas with an available depth map within a bounding rectangle.
- generate-gist: evaluate the GIST descriptor of an image.
- generate-textonmap: generate the textonmap of an image.
- generate-segmentation: segment an image into its ground, sky, vertical, and porous components.
- encode-panos: encode panoramas as described in Naik et al (2014).
- encode-images: encode panoramas with a process derived from Naik et al (2014).
- encode-point-clouds: encode point clouds derived from Li et al (2021). CUDA option to be added
- cluster-data: a notebook interface to perform k-means clustering.
- train-svr: a notebook interface to perform regression via a SVM.
- segment-images: semantically segment an image with the OpenMMLab Semantic Segmentation Toolbox and Benchmark. CUDA option to be added
- depth-segment-images: estimate the metric depth of an image with Depth Anything V2. CUDA option to be added
To use the pipeline described by one notebook, build and run its respective Docker image. For example, to use the pipelines in util.ipynb:
sh ./util/build-util.sh
sh ./util/run-util.sh
To build all Docker images, use Docker Compose:
docker-compose build
docker-compose up -d
Follow the instructions in the respective notebook to use it. In general and if Dockerfiles have not been modified, you may connect to the respective Jupyter server with http://localhost:<port>/tree?token=<pipeline name>
.
For example, you may connect to the Jupyter server for util.ipynb with http://localhost:8000/tree?token=util
.
Port numbers are described in docker-compose.yml and stated in the respective notebooks.
- Depth-Anything-V2
- graph-segmentation
- lear-gist-python
- mmsegmentation
- PbLite-Contour-Detection
- Photo Pop-up
- rotation-invariant-pointcloud-analysis
- streetlevel
Dubey, A., Naik, N., Parikh, D., Raskar, R., Hidalgo, C.A. (2016). Deep Learning the City: Quantifying Urban Perception at a Global Scale. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science(), vol 9905. Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_12
D. Hoiem, A. A. Efros and M. Hebert, "Geometric context from a single image," Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, Beijing, China, 2005, pp. 654-661 Vol. 1, doi: 10.1109/ICCV.2005.107
Felzenszwalb, P.F., Huttenlocher, D.P. Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59, 167–181 (2004). https://doi.org/10.1023/B:VISI.0000022288.19776.77
F. Li, K. Fujiwara, F. Okura and Y. Matsushita, "A Closer Look at Rotation-invariant Deep Point Cloud Analysis," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 16198-16207, doi: 10.1109/ICCV48922.2021.01591.
N. Naik, J. Philipoom, R. Raskar and C. Hidalgo, "Streetscore -- Predicting the Perceived Safety of One Million Streetscapes," 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 2014, pp. 793-799, doi: 10.1109/CVPRW.2014.121.
Oliva, A., Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision 42, 145–175 (2001). https://doi.org/10.1023/A:1011139631724
Salesses P, Schechtner K, Hidalgo CA (2013) The Collaborative Image of The City: Mapping the Inequality of Urban Perception. PLOS ONE 8(7): e68400. https://doi.org/10.1371/journal.pone.0068400
Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., & Zhao, H. (2024). Depth Anything V2 (arXiv:2406.09414). arXiv. https://doi.org/10.48550/arXiv.2406.09414