Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the size of retrieved files #68

Open
danieleongari opened this issue Oct 14, 2020 · 0 comments
Open

Reduce the size of retrieved files #68

danieleongari opened this issue Oct 14, 2020 · 0 comments

Comments

@danieleongari
Copy link
Collaborator

danieleongari commented Oct 14, 2020

From a recent analysis, my AiiDa repository of 31GB has ca.:

  • 7GB (23%) occupied by retrieved restart files of Raspa
  • 6GB(20%) occupied by output .data files of raspa
    Other files related to Raspa (cif, simulate.input, _aidasubmit.sh, and force field .def files), are under 1GB, so no need to care now.

I suggest to take action to reduce the size of the repository, given the large number of RaspaCalculation that are called during our workchain.

Restart files

I suggest not to retrieve them by default but use them locally as we do e.g., for cube files in aiida-ddec.
However, I would leave an option for the user to retrieve them: this is the case of saturated system that require a long equilibration, or for the simulated annealing work chains of aiida-lsmo where a cif file is computed from the restart (https://github.com/lsmo-epfl/aiida-lsmo/blob/6c32eebdaefd11dc226d849415e363c6222a475a/aiida_lsmo/workchains/sim_annealing.py#L88-L106)

Output files

Here we want to think about it because we have different type of data:

  • check values that are elaborated directly from the input (or default parameters) that we might want to keep for reproducibility
  • info about the cycles that are printed with PrintEvery and one want to check to see what is going on in the calculation. We generally use 10 checkpoints, so it is not a huge amount of data and we should keep it.
  • outputs, that can not be suppressed but are useless for some calculation (e.g., Gibbs/Widom null results, when doing GCMC calculations). We can not suppress them unless we change the raspa code.

Therefore, for the output files we need more discussion to reduce the hdd consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant