Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation methods, polygon tables #22

Open
njmattes opened this issue Mar 7, 2016 · 15 comments
Open

Aggregation methods, polygon tables #22

njmattes opened this issue Mar 7, 2016 · 15 comments

Comments

@njmattes
Copy link
Contributor

njmattes commented Mar 7, 2016

@ricardobarroslourenco Once we have a finalized procedure for returning a raster of values, we should finish at least one procedure for aggregating those to polygons. But we don't yet have polygons, I just realized. We need to sort out a way to store these ASAP (this is also mention in issue #9). @legendOfZelda Do you know if plenario has an ingestor for Shapefiles already in place?

@ricardobarroslourenco
Copy link
Member

@njmattes there is a shp2pgsql (such as the raster2pgsql) described in a tutorial. I saw that is also possible to get a GeoJSON compliant polygon and ingest. The shapefiles to ingest would be the GADM regions?

@njmattes
Copy link
Contributor Author

njmattes commented Mar 8, 2016

Yes I think we can start with just the GADM0 regions.Unless it's easier to ingest all of them (GADM1, GADM2) at once. Joshua's mentioned FPU regions, but I don't have outlines for those.

@ricardobarroslourenco
Copy link
Member

@njmattes we could load them in batches. Specially if something goes wrong, and we need to rollback. Do we have all those shapefiles already downloaded, or we still need to do so at the GADM website?

@njmattes
Copy link
Contributor Author

njmattes commented Mar 8, 2016

I don't have the GADM files anymore (they're huge). You can download the entire world here: http://www.gadm.org/version2.

@ricardobarroslourenco
Copy link
Member

@njmattes I've created a folder at /var/www/gadm and downloaded the entire world there (by the way, super fast - 334MB in 14s). It is still zipped, and I would like to hear from @legendOfZelda if he tried to ingest shapefiles on his tests, and if we need to change something prior to ingest.

@ghost ghost self-assigned this Mar 10, 2016
@ghost
Copy link

ghost commented Mar 11, 2016

i ingested the .shp file into a table called regions using ogr2ogr. the table with its columns was created by ogr2ogr itself but we don't really want all these columns (varname_1, nl_name_1, engtype_1, etc). all we need is a primary key, geom, and a JSON with the metadata (that can contain the geometry again, not super efficient in space but convenient).

besides, a new .shp file might have other attributes than those, forcing us to let ogr2ogr create a new table for that .shp file. however, it is possible to append a .shp file to an existing PostGIS table as explained here: http://spatialmounty.blogspot.com/2015/05/ogr2ogr-append-new-shapefile-to.html. that requires mapping the attributes in the .shp file to attributes that already exist in the table but it will lead to cumbersome NULLs anyway because shapefiles are not uniform, i.e. don't always have the same metadata fields.

so because of that, again, i suggest we use a table regions(uid=primary key, geom=geometry, meta_data=JSON) that is able to store heterogeneous shapefiles (i guess that goes with your question @ricardobarroslourenco of whether we need to 'change something prior to ingest'...?). to achieve that i am planning to use ogr2ogr to first convert a .shp to a .geojson and then ingest document by document, also making use of PostGIS' ST_GeomFromGeoJSON.

let me know if you c a better approach.

@ricardobarroslourenco
Copy link
Member

As said on monday, for this prototype, it is ok to use ogr2ogr. But on a further iteration should be interesting to replace into a more customized function, to avoid the generation of blank columns.

@ghost
Copy link

ghost commented Mar 30, 2016

how about this schema here?

CREATE TABLE regions_meta (
    uid         bigserial primary key,
    name        text,   # e.g. GADM, EEZ
    version     text,   # for e.g. GADM: 1, 2.0, 2.7, 2.8 ;
                        # for e.g. EEZ: 1, 2, 3, ..., 6, 6.1, 7, 8
    attributes  text[]  # for e.g. GADM v2.8:   ID_0, NAME_0, VARNAME_0, TYPE_0, ...
                        # for e.g. EEZ  v8:     ID, OBJECTID_1, EEZ, Country, Sovereign, Remarks, ...
);

CREATE TABLE regions (
    uid         bigserial primary key,
    geom        geometry,
    meta_data   jsonb,  # key-value pairs with keys = attributes from regions_meta
    meta_id     bigint references regions_meta(uid)
);

to support GADM, EEZ (Exclusive Economic Zones Boundaries), and others, e.g. the ones mentioned in http://www.gadm.org/links

@ghost ghost mentioned this issue Mar 30, 2016
@ghost
Copy link

ghost commented Mar 31, 2016

i am almost done with writing the ingestion script for shapefiles such as for GADM, EEZ (exclusive economic zones), etc. once we have these tables filled up we can specify polygons by ID which again is convenient when specifying the flask URL.

@ghost
Copy link

ghost commented Apr 1, 2016

ok, ingestion into above 2 tables works, just ingested EEZ (GADM works too but it just takes too long as for now). now i can continue with the flask routes and within the urls i can conveniently use polygon IDs.

@ricardobarroslourenco
Copy link
Member

Nice. How long did it take to load? I just arrived at the CI and I'll be reviewing stuff.

@ghost
Copy link

ghost commented Apr 1, 2016

took roughly 5' to ingest EEZ but im still ingesting one polygon at a time, so lots of potential to speed it up. it's all yet still in the severin branch, not yet merged into develop. the new tables are, as usual, in models.py and the shapefile ingestion script is ingest/ingest_shapes.py.

today i will resume the work on the flask routes in views.py.

@ricardobarroslourenco
Copy link
Member

Ok. I'll look into the severin branch. I'll be also reviewing the architecture we are using as @njmattes asked on monday.

@ricardobarroslourenco
Copy link
Member

@legendOfZelda about the /ede/cache_builder.py, is it a cache that gets records in space, but caching all time frames that are present?

@ghost
Copy link

ghost commented Apr 1, 2016

hm, haven't looked into cache_builder.py yet, but we can discuss it today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants