home

Welcome to the soc-maps wiki!

Toward a Coherent System of Managing and Publishing Geospatial Information

Managing Spatial Data Repositories

Concept

CCCS maintains a separate geospatial data repositories for publicly-available data that can be displayed on the CCCS website and distinct from data specific to each of our client projects (which typically include a mixture of of both public and proprietary data sources). CCCS' archive of publicly-available data is currently managed via 'GitHub' at /cccs-web/soc-maps/.

Private client data is managed independently and presented on cloned version of the CCCS website. To maintain the greatest control of client data (as well as privacy with regard to our client-specific work), we managed client-project repositories on CCCS web-servers, using the 'Gitlab' software package to facilitate graphical user interface (GUI) management and of repository version-controlling. The 'Gitlab' software also includes a wiki platform to maintain meta-level discussion of the project and file system, as well as a task "ticketing" system for raising work requests ("issues") assigning these requests, and tracking request completion.

As indicated above, CCCS wishes for all work to be viewable easily within the CCCS website [https://github.com/cccs-web] as a unified "user environment" for technical writers as well as for clients.

CCCS clients have cloned versions of CCCS' data management and web-publishing infrastructure to allow for self-contained "data dumps" to be provided to clients so that they can be spun-up as unique machines.

Our first, most fundamental, challenge with regard to data management related to how to store and link GIS data (and databases) to best ensure that data are appropriately managed, version-controlled, stored and shared across projects. We wish to avoid excessive and un-tracked branching of information.

A related issue is the format for data storage. Most GIS data we receive come in the form of shapefiles. To archive these data (and to preserve along with them the associated meta-data, including 'original' file names and name of the data source), CCCS is currently using postgreSQL. Our idea is that using databases to manage shapefiles will also facilitate better interoperability and referential integration with other data types. Among other database applications, postgreSQL is arguably best suited for management of spatial data. The merits of the added efforts of database utilization, however, are not a matter about which CCCS is sufficiently informed at this moment in time.

Initial Approach and Current Standing

CCCS' GIS data repository

The /cccs-web/soc-maps/ repository was established for developing a web application to showcase our capacities in spatial analysis and the production of cartographic data visualization. We would like to make greater use of geospatial analysis to support our work in the field of social impact assessment. We would like our web application to be presented through our Django web framework, hosted at crossculturalconsult.com.

CCCS' early development work utilized GitHub to version-control both the web mapping software application as well as the spatial data that we intended to referenced by that application. While this approach made it convenient for simultainiously sharing both spatial data and the applications' code base within one repository to our various developers and application users, both CCCS' IT expert, Paul Whipp, and our GIS consultants, Kartoza Pty., recommended against this 'combined' approach. The Git version-control system is non-optimal for management of spatial data primarily because repository size for such data tends to grow too large for Git to be effective [e.g., one tends to start encountering timeout errors when cloning and pulling larger repositories, and the system may not be able to manage any single file over 2 gigabytes in size (such as a a dump of a spatial database)].

Following the advice of our consultants, CCCS adopted an alternative system for sharing spatial data using btsync. While this approach has the advantage of allowing us to share larger-sized repositories of spatial data outside of Git, it has the disadvantage of creating parallel data sources (i.e., the shared btsync repository on the client side and the sever-side 'production' repository. Another disadvantage of the btsync approach is that data are not version controlled. We therefore view the btsync data-management approach as a 'temporary' solution.

With regard to the creation and management of geospatial data in database format, Kartoza Pty has scripted database imports for CCCS' public repository as well as one of our client project repositories. In its initially formulation, this import script loaded data into a postgreSQL data via a Docker container [a software application that CCCS has requested be eliminated from our software application stack]. CCCS has since dumped these databases from within Docker and imported them directly to our server's postgreSQL data—renaming them in the process to:

These copies of these database dumps are available via their respective repositories (linked above).

PLEASE NOTE: The docker-oriented shapefile data import script.

CCCS Server is organized as follows:

CCCS web

The CCCS website is installed under the cccs user. Under this user, you'll find the following directories (which should be self-explanatory):

cccs/production cccs/staging cccs/mezzanine

CCCS-Abadi

The CCCS-Abadi website is installed under the 'abadi' user. Under this user, you'll find the following directories

abadi/production * this is the home of 'abadi-web', a django application abadi/production_engagement * this is a test application to extend the 'abadi-web'; the role of this app is to facilitate tracking of project engagement with community and public stakeholder; the app is in very early development abadi/production_map_app * this is a test mapping application to extend the 'abadi-web' application; we were trying out 'leaflet' and 'ol3js', but we found that the work of putting materials online was too time intensive and depended too much on the input of programmers; we need a quick-and-dirty tool to get maps online, which is why we turned to Kartoza for support abadi/production_esms_map

this folder is linked to the Abadi GitLabs esms-map repository (http://gitlab.crossculturalconsult.com/abadi/esms-maps), which is forked from 'soc-maps' on GitHub

The shapefiles folder was located in /home/sync/map-app/source_material/shapefiles/. The script from GitHub was looking in /home/sync/cccs-maps/, so no matter what we needed to add in the absolute path in order for the loader to find the shapefiles.

I started by creating a cccs-maps folder in the sync directory. Within that folder, I changed the paths a bit for clarity. We therefore now have the following:

For CCCS public websites:

/home/sync/cccs-maps/public/

For CCCS private websites:

/home/sync/cccs-maps/private/

Within the 'private' folder, we have our client projects:

/home/sync/cccs-maps/private/abadi/ /home/sync/cccs-maps/private/wbn /home/sync/cccs-maps/private/worldbank_russia

I therefore moved all the shapefiles from map-app (which can be deleted) to reside either in /home/sync/cccs-maps/public/ (if public) and /home/sync/cccs-maps/private/abadi/ (if private).

I am also adding in some of the 'missing' files that Admire has been looking for. This includes the 'raster' files from the abadi dataset: /home/sync/cccs-maps/private/abadi/mtb_raster, as well as loading in some US government data to CCCS public: /home/sync/cccs-maps/public/usgs_earthexplorer

In sum: All of the previous data are in the sync drive. Much more data are added. I started moving the files to get them to correspond better to the script, but in the process decided to establish clear 'public' and 'private' bins, so that later we can separate out these repositories entirely.

I have deleted the git repositories as agreed. I also disconnected and re-connected my /sync/ folder according to the new structure, thus:

/home/sync/cccs-maps/public [15.0 GB] connected with key: A3Z2MYPIVV6D6WIXLRR7H5R7QAQLMY545

/home/sync/cccs-maps/private [1.5 GB] connected with key: A2K6DCLRIJHJF2UALJJLBGXBEDNPICWTP

I am no longer syncing with the original key that Tim provided.

Please let me know when we're all connected.

I remain a bit behind with my local testing of the docker application. Please excuse me. I have a major deliverable to our client on Monday, so it will be hard for me to dedicate much time and concentration before then.

Overcoming Challenges—Next Steps

CCCS and Kartoza Pty continue to explore options for spatial data management. We have tentatively settled on using the 'GeoGig' platform for distributed version-control of shapefile data. The GeoGig software allows users to import raw geospatial data (currently from Shapefiles, PostGIS or SpatiaLite).

Can I add a few more moving parts into the conversation? I have had a play around with your spatial datasets and my first recommendation is that you actually don't store these in Git, but in GeoGig [1] (formerly GeoGit) for the vector datasets and as a private btsync [2] share for the raster data. The reason for my suggestion to use GeoGig is that Git doesnt make a very good platform for versioning geospatial data unless it is in some plain text format like GML (geographic markup language) or CSV. Also the raster data in your git repo is very large and makes working with git awkward. I spent a bit of time making a diagram showing how I see the high level architecture working (attached). Yes there would be a bit of pushing and pulling between repos and branches but I think that is fine and just normal modus operandi for working in Git.

As far as working against CCCSs git repo, I think that sounds just fine. We will make a fork in Github and make pull requests to your repo. Hopefully my diagram will make it clear if I understand your intention well enough and will relay what I am thinking too. Feel free to tweak it if needed using the link I have provided below [3].

Kartoza has contributed some to investigating use of the GeoGig application, though CCCS has yet to receive a briefing about the particular details of this work. Kartoza invoices to CCCS suggest that most of the work relates to embedding GeoGig into a Docker image; CCCS has requested Kartoza review of work charged to CCCS on this matter on the grounds that our request to stop all Docker-related development was prior to this work being conducted.

With regard to moving forward with 'GeoGig':

CCCS needs to create GeoGig repositories for 'public' data and for each of our our 'client' projects. The GeoGig repositories must have their 'MASTER' branch hosted on CCCS' servers.

Progress in this regard is as follows:

We have installed GeoGig to our data server:
geogig@ip-10-167-186-14:/home/aaron$ geogig version

Project Version : 1.0-beta1 Build Time : August 14, 2014 at 17:44:46 ART Build User Name : Gabriel Roldan Build User Email : [email protected] Git Branch : r1.0-beta1 Git Commit ID : 9aae709f4f451802a09c14293c92a46372c868bd Git Commit Time : August 14, 2014 at 17:43:33 ART Git Commit Author Name : Gabriel Roldan Git Commit Author Email : [email protected] Git Commit Message : Set version to 1.0-beta1
We have enabled the user 'geogig' to call the geogig application by adding the PATH variable to the user's /bashrc file/
We created a DNS entry to link traffic coming in from http://geogig.crossculturalconsult.com to our desired geogig server

Remaining tasks to get the GeoGig set-up working are:

We need to configure nginx appropriately to allow us to push and pull data to each GeoGig project
We need to import existing data into GeoGig. While the GeoGig documentation suggests that it is possible to import postgreSQL data as well as shapefiles, it remains unclear to CCCS how such imports are to occur. That is, what happens if we import both a postgreSQL data base as well as shapefiles? Does GeoGig automatically re-structure the data and allow us to re-export these files as a single postgreSQL database, or does it keep each file entity separate? Would the GeoGig system act as the database, or is it used only to version control changes to a database? Does GeoGig also allow us to manage regular data tables, such as socio-economic and cenus data? We'd like some coaching and tutorials on how data management should occur with GeoGig as our version-control system.roject- specific requests.

Provide feedback

Saved searches