You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on a cluster where squeue and sbatch sometimes fail, for whatever reason.
Submitting 1080000 jobs in 1080 chunks using cluster functions 'Slurm' ...
Submitting [========>------------------------------------------] 17% eta: 11mError: Fatal error occurred: 101. Command 'sbatch' produced exit code 1. Output: 'sbatch: error: Invalid user for SlurmUser slurm, ignored
sbatch: fatal: Unable to process configuration file'
Submitting 15000000 jobs in 7500 chunks using cluster functions 'Slurm' ...
Submitting [===========================>-----------------------] 55% eta: 4hError: Listing of jobs failed (exit code 1);
cmd: 'squeue --user=$USER --states=R,S,CG,RS,SI,SO,ST --noheader --format=%i -r'
output:
squeue: error: Invalid user for SlurmUser slurm, ignored
squeue: fatal: Unable to process configuration file
these errors are transient; after resubmitting the jobs, everything continues as it should. It would be nice to have an option for this to happen automatically, since then one could let batchtools submit jobs over night. My suggestion would be to give an option to retry slurm commands X times with Y seconds of pause in between (possibly with exponential backoff).
The text was updated successfully, but these errors were encountered:
I am working on a cluster where
squeue
andsbatch
sometimes fail, for whatever reason.these errors are transient; after resubmitting the jobs, everything continues as it should. It would be nice to have an option for this to happen automatically, since then one could let batchtools submit jobs over night. My suggestion would be to give an option to retry slurm commands X times with Y seconds of pause in between (possibly with exponential backoff).
The text was updated successfully, but these errors were encountered: