Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] How data is managed by new InaSAFE Realtime v4 #272

Open
lucernae opened this issue Feb 27, 2018 · 0 comments
Open

[RFC] How data is managed by new InaSAFE Realtime v4 #272

lucernae opened this issue Feb 27, 2018 · 0 comments

Comments

@lucernae
Copy link
Contributor

For clarification, I will write up some rules and behaviour and what to expects in data management in InaSAFE Realtime v4.

Goals

  • InaSAFE produce many outputs. We clean up periodically the intermediate products to save disk space.
  • Realtime needs to save the raw hazard data (shake grid, flood geojson, ash tif) to reproduce analysis in desktop or in Realtime whenever necessary.
  • Realtime needs to save the pdf reports generated to be able to quickly download and shared to InAWARE
  • Users who wanted to download any intermediate products, these are including: hazard layer with keywords, collections of analysis results with layers and everything, will be downloaded from filesystem. If it's not exists, then it should be generated first.

Consequences

  • We will not be able to regenerate old InaSAFE v3.5 analysis. We can only keep the end products (the pdfs)
  • It is best to delete all intermediate products in old InaSAFE Realtime v3.5. These includes: mmi contours, analysis layers, hazard layers, processed shake grid file, processed shapefiles for flood, processed tif files for ash. This will release many disk space that we probably don't need to keep.
  • We will not be able to download intermediate results from old InaSAFE Realtime v3.5
  • We will keep the raw hazard data from old InaSAFE Realtime v3.5, but migrate it into database for easier management if possible. These includes shake grid xml, flood data geojson. We still needs to save Ash hazard tif as file (no raster support in django 1.8).
  • Because intermediate results were not saved, it needs to be regenerated whenever user tried to download it and it doesn't exists in filesystem.
  • Regenerated products can potentially report different numbers or layout, because of the possibility that exposures or report templates might be updated.

Implementation

  • Analysis happens normally and all the products (intermediate or finals) will be stored on disk. DB will only saved the path references for each events.
  • Clean up job will occurs periodically, for example every week or every month. Or for specified length of period (e.g. events older than one months). This will happen using a nightly/weekly celery tasks.
  • When user tried to download any intermediate products that doesn't exists in the filesystem (e.g. analysis layer or mmi contours for given shake) because it was already being cleaned up. Then, it needs to be regenerated.
  • Regenerated products can potentially have different report than currently being saved/exists. This can happen because exposures might be updated (different analysis numbers) or report templates are updated (different report layouts). Thus we will not update saved reports in database, unless it is explicitly deleted in django admin, and so allows the analysis to save new reports.

CC @timlinux @Charlotte-Morgan @ismailsunni @myarjunar @Gustry

I will begin the implementation of clean up procedures soon. So, it is better if everyone involved read the consequences again to make sure everyone is on board with the idea or make comments if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant