Reduce the size of retrieved files #68

danieleongari · 2020-10-14T14:23:32Z

From a recent analysis, my AiiDa repository of 31GB has ca.:

7GB (23%) occupied by retrieved restart files of Raspa
6GB(20%) occupied by output .data files of raspa
Other files related to Raspa (cif, simulate.input, _aidasubmit.sh, and force field .def files), are under 1GB, so no need to care now.

I suggest to take action to reduce the size of the repository, given the large number of RaspaCalculation that are called during our workchain.

I suggest not to retrieve them by default but use them locally as we do e.g., for cube files in aiida-ddec.
However, I would leave an option for the user to retrieve them: this is the case of saturated system that require a long equilibration, or for the simulated annealing work chains of aiida-lsmo where a cif file is computed from the restart (https://github.com/lsmo-epfl/aiida-lsmo/blob/6c32eebdaefd11dc226d849415e363c6222a475a/aiida_lsmo/workchains/sim_annealing.py#L88-L106)

Here we want to think about it because we have different type of data:

check values that are elaborated directly from the input (or default parameters) that we might want to keep for reproducibility
info about the cycles that are printed with PrintEvery and one want to check to see what is going on in the calculation. We generally use 10 checkpoints, so it is not a huge amount of data and we should keep it.
outputs, that can not be suppressed but are useless for some calculation (e.g., Gibbs/Widom null results, when doing GCMC calculations). We can not suppress them unless we change the raspa code.

Therefore, for the output files we need more discussion to reduce the hdd consumption.

The text was updated successfully, but these errors were encountered:

Provide feedback