[RFC] How data is managed by new InaSAFE Realtime v4 #272

lucernae · 2018-02-27T05:30:57Z

For clarification, I will write up some rules and behaviour and what to expects in data management in InaSAFE Realtime v4.

Goals

InaSAFE produce many outputs. We clean up periodically the intermediate products to save disk space.
Realtime needs to save the raw hazard data (shake grid, flood geojson, ash tif) to reproduce analysis in desktop or in Realtime whenever necessary.
Realtime needs to save the pdf reports generated to be able to quickly download and shared to InAWARE
Users who wanted to download any intermediate products, these are including: hazard layer with keywords, collections of analysis results with layers and everything, will be downloaded from filesystem. If it's not exists, then it should be generated first.

Consequences

We will not be able to regenerate old InaSAFE v3.5 analysis. We can only keep the end products (the pdfs)
It is best to delete all intermediate products in old InaSAFE Realtime v3.5. These includes: mmi contours, analysis layers, hazard layers, processed shake grid file, processed shapefiles for flood, processed tif files for ash. This will release many disk space that we probably don't need to keep.
We will not be able to download intermediate results from old InaSAFE Realtime v3.5
We will keep the raw hazard data from old InaSAFE Realtime v3.5, but migrate it into database for easier management if possible. These includes shake grid xml, flood data geojson. We still needs to save Ash hazard tif as file (no raster support in django 1.8).
Because intermediate results were not saved, it needs to be regenerated whenever user tried to download it and it doesn't exists in filesystem.
Regenerated products can potentially report different numbers or layout, because of the possibility that exposures or report templates might be updated.

Implementation

Analysis happens normally and all the products (intermediate or finals) will be stored on disk. DB will only saved the path references for each events.
Clean up job will occurs periodically, for example every week or every month. Or for specified length of period (e.g. events older than one months). This will happen using a nightly/weekly celery tasks.
When user tried to download any intermediate products that doesn't exists in the filesystem (e.g. analysis layer or mmi contours for given shake) because it was already being cleaned up. Then, it needs to be regenerated.
Regenerated products can potentially have different report than currently being saved/exists. This can happen because exposures might be updated (different analysis numbers) or report templates are updated (different report layouts). Thus we will not update saved reports in database, unless it is explicitly deleted in django admin, and so allows the analysis to save new reports.

CC @timlinux @Charlotte-Morgan @ismailsunni @myarjunar @Gustry

I will begin the implementation of clean up procedures soon. So, it is better if everyone involved read the consequences again to make sure everyone is on board with the idea or make comments if necessary.

lucernae added enhancement feature request labels Feb 27, 2018

lucernae self-assigned this Feb 27, 2018

lucernae added this to the InaSAFE v4 Migration milestone Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] How data is managed by new InaSAFE Realtime v4 #272

[RFC] How data is managed by new InaSAFE Realtime v4 #272

lucernae commented Feb 27, 2018

[RFC] How data is managed by new InaSAFE Realtime v4 #272

[RFC] How data is managed by new InaSAFE Realtime v4 #272

Comments

lucernae commented Feb 27, 2018

Goals

Consequences

Implementation