Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Scope of the package in terms of raster and vector support #205

Closed
goergen95 opened this issue Feb 10, 2024 · 10 comments
Closed

Comments

@goergen95
Copy link

Hello @ctoney,

this is awesome work and I am looking forward to watch this package's development more closely.
It really enables a lot of workflows to handle geospatial data in R that were/are lacking in other projects.

This issue is just a question about your plans for future development. First, the name gdalraster
already suggests its main purpose, I guess. However, I already see first attempts to support OGR?

Personally, I see a great advantage of a R package supporting low-level access to both the raster and vector data models in GDAL/OGR. Is it planned to eventually add full support for both? Or is this already possible and I was just not able to see it?

Appreciate to learn your thoughts on this.

@ctoney
Copy link
Collaborator

ctoney commented Feb 12, 2024

@goergen95 I appreciate your comments and interest in the package. I wasn't planning to add vector support analogous to the interface for raster. The internal functions defined in src/ogr_util.h are the only vector support currently. Those were added for polygonize() which needs to set up the output layer in a potentially new vector dataset.

It should be straightforward to support vector using a class-based interface like that for raster (i.e., based around RCPP_EXPOSED_CLASS). There would probably be a vector dataset class which would break from the unified dataset in the GDAL API, but that seems fine. In the existing dataset class (gdalraster::GDALRaster), the underlying raster band objects in GDAL are accessed by method argument and don't exist as separate objects in R. I'm not sure if vector would be single-class. The interface could provide dataset and layer management, OGR layer operations on vector data sources, cursor on feature tables for access to attributes and geometry, read/write and perform operations on geometries. Is that the kind of low-level access to vector data you have in mind? That's sort of a larger project roughly similar in effort to what is there now for raster, I would guess.

The existing OGR utility functions could be fleshed out and documented pretty easily, if having those would be worthwhile in general (I'm not sure and wasn't planning to specifically). A set of OGR utilities could be added to as needed, but the overall scope would be more limited.

Are there certain applications or capabilities that you're interested in for vector data sources?

@goergen95
Copy link
Author

goergen95 commented Feb 14, 2024

Thank you very much for your detailed answer and for taking the time
to consider my use-case.

I think it would be best to describe in more detail how my use-case actually looks like.

Over at mapme.biodiversity we try to make it seamless for analytical users
to calculate variables based on spatial data that is input for evidence based
decision making. This is achieved by hiding the complexity of spatio-temporal
matching between the regions of interest and a diverse set of open geodata
(both raster and vector).

The data requests we receive are really quite diverse and we have to deal
with sources provided on a continuum ranging from cloud-optimized
data formats hosted on low-latency servers and searchable via STAC to original
research data in non-optimal data formats put into flat Zip-Files and hosted
on services like Zenodo/Figshare or some ftp server.

Now, the easy answer to such requests would be that we'll have to wait until someone
takes care of putting this less accessible data into well-designed formats in some
cloud bucket, but that is not really an option.

So, to take mapme.biodiversity a step forward, I am currently exploring ways to leverage
GDAL's unified API to access different kinds of geospatial data. Behind the scenes, the implementation
I am trying to develop, does a lot of cataloging of the source locations of spatial data sets,
opening options, spatial footprints, etc. Users might request to create "copies" of these
sources to their local file systems or an attached cloud storage (though I am currently
leaning towards "translating" to make sure data is in proper formats for random access).
During read-time, only the portion intersecting with a region of interest will be read.

I am thus less interested in another analytical data model (I think sf and terra do just fine),
but in a more direct access to GDAL's I/O capabilities (which in my POV sf/terra were not designed for,
even-though one can actually get quite far with their GDAL bindings already - it just does not feel natural).

In essence, it seems gdalraster already provides the required functionality for this use case
in terms of raster data sets and if the same level of access was available for vector data
I think it could make sense for mapme.biodiversity to do the data cataloging and I/O stuff via
gdalraster.

I hope this is helpful to clarify the context of my question.

@mdsumner
Copy link
Collaborator

mdsumner commented Feb 14, 2024

@goergen95 for your use-case I would have a look at the new GTI driver in GDAL (targeted for 3.9) I'll have a closer look but I think there's scope there for your needs.

And ... just to throw in I also noticed these, I have nothing more to add atm but just want to make sure more people see the breadth of community interest (that we could potentially consolidate better than has been done in recent years). I'm very keen to get a real API for GDAL in R and extending gdalraster to vector as well is as good as any other means (I use osgeo package in Python via reticulate for my real benchmark, or in Python of course):

https://github.com/caiohamamura/gdalBindings-r

https://github.com/brownag/ROGRSQL

(there's also VicMapR that grew out of bcdata that has a lot in terms of the "lazy tbl" for gdal vector that would be so valuable to have control of, and of course the geoarrow efforts are a good stream to align to for a community foundation).

there's so many disparate pieces I actually need to keep notes on them now, so thought I'd throw these in for visibility

🙏

@ctoney
Copy link
Collaborator

ctoney commented Feb 18, 2024

@goergen95, thanks for the description and background on mapme.biodiversity. That helps a lot. I started playing with the package a bit which also helped me understand better and appreciate the workflows you're implementing. It's a very interesting project. I'm glad to know about it and look forward to following. I may have a question or two once I'm able to spend a little more time working with it.

Also appreciate the input from @mdsumner on this topic. I wonder about the current read support for vector data sources that vapour provides, and future plans for its development as well.

I think there are certain advantages to implementing with Rcpp modules, and RCPP_EXPOSED_CLASS in particular. It provides a natural interface to the underlying class-based model. It is straightforward to persist a dataset pointer on the R side. Exposing C++ classes directly in R reduces the amount of wrapper code required (e.g., that would be needed for an S4 interface?), making the code simpler so faster to develop and easier to maintain. I expect opinions would differ though, and of course there are trade offs and personal preference. The interface in gdalraster lacks idiomatic R but does provide direct API bindings with more complete and customized access to the library. That is beneficial or even required for certain use cases, but more niche compared with the analytical data models.

I'm open to working on a vector interface if there is rationale for it. Not sure exactly what that would look like though. I haven't thought through it enough to really gauge the amount of effort, but a good bit of work obviously. I wonder about time frame for something to still be useful in your project (not that there needs to be specific commitment or anything like that). Other routes could be better but I expect you've explored several options at this point. Another consideration is that the use cases for a low level vector API in R seem somewhat more limited and specialized than for raster. Appreciate any further comments or suggestions. If we do consider implementing something then discussion of a few details would be helpful.

@mdsumner
Copy link
Collaborator

mdsumner commented Feb 19, 2024

well, I would really like to (re) implement a lot of vapour in the context of Rcpp modules, and I think a gdalvector counterpart to this package is a good start - I'd benefit a lot from even the bare bones of a vector source info implemented via modules - so if that's easy for you to get started I think it would be easy (easier) for me to pick it up from that start.

There's a lot of vapour that is a mess, and it needs to stay backwards compatible so I'm inclined to start again in a new package. I will get to it myself at some point, but fwiw I could probably pick up from progess by others and I'd be happy to stick to a list of features to implement first.

I very often go to osgeo.gdal or osgeo.ogr to get what I want, but this package has quickly become very rich in those features 👌

@ctoney
Copy link
Collaborator

ctoney commented Feb 19, 2024

@mdsumner, thanks for your feedback. That is good to know. Pending additional comments from Darius, I will follow up on potential proposal for development. Mainly, I'd like to know there is reasonable consensus that adding this functionality would benefit known use cases, and that it makes sense to add it here based on the existing interface. I'm certainly open to hearing anything counter to that.

@goergen95
Copy link
Author

Really appreciate the discussion. Agreed, that opinions might differ if "only" giving users pointers instead of a fully-fledged interface is helpful. On the other hand, if this package was to provide modularized low-level access to GDAL/OGR functionality it could actually be used by others creating higher-level interfaces (that is what I understand from your last comment @mdsumner, or did I get something wrong?). I see this already as a great benefit for the community because people could actually just start building things without being doomed to use one of the available data models that might not have been created with their use case in mind.

Concerning the functionality, I think anything along the lines of osgeo.ogr is a win. I would also not bind any potential developments here to what is happening in MAPME. I just announced a rather tough schedule over there for some outstanding changes that we need to properly work with the package in a new cloud environment. I do have a plan to achieve this with the usual spatial packages. But also, I would not hesitate to make use of gdalraster in the future once there is proper vector support.

@ctoney
Copy link
Collaborator

ctoney commented Feb 21, 2024

Appreciate the helpful discussion. I will work on a draft of proposed bindings to OGR based on the current implementation for raster using Rcpp modules. It will include a description of the interface along with initial attempt at a class definition and method signatures for the core functionality. I'll post that here once ready, for possible review and feedback. Thanks for considering.

@ctoney
Copy link
Collaborator

ctoney commented Mar 1, 2024

@goergen95 , @mdsumner -
I opened issue #241 for possible comments on design, features, potential issues etc. It links to a draft description of a vector interface, and a prototype with partial implementation. Thanks again for your interest in this.

@goergen95
Copy link
Author

Thanks, looking forward to dive in. Closing here then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants