You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LOGGER.error(f"Job {str(job.job_id)} running as child from SpawningWorker timed out during processing.")
returnresult_id
it should be taken into account that the singlejobworker subprocess might crash. We've noticed cases where it does crash, and in those cases what happens is that result_id = None is returned, and Finished Remote Job with resuld_id: None is logged. This causes the job to appear as finished in the web, but trying to access the result will result in an error page saying the result doesn't exist.
I think this can be easily handled with (plus maybe logic to mark job as failed):
with job as j:
LOGGER.info("Processing Remote Job: %s", job)
result_id = self._executeJobPayload(j["payload"], job)
- # result should have already been persisted by the child process,we repeat it here to close the job for the queue- job.result = result_id- LOGGER.info("Finished Remote Job with result_id: %s", result_id)+ if result_id:+ # result should have already been persisted by the child process,we repeat it here to close the job for the queue+ job.result = result_id+ LOGGER.info("Finished Remote Job with result_id: %s", result_id)
One other thing I wanted to mention, is that logs from the child singlejobworker process aren't appearing in logs when for example doing docker logs <worker-container-id>. Since you already get stdout and stderr of the child process, I think it's just a matter of logging them from the spawningworker parent process
Other than this the feature seems to be working great, and thank you for taking the time to implement it :)
The text was updated successfully, but these errors were encountered:
Hey!
I've tried the new spawning worker type, and I have some remarks about it:
In this section:
mcrit/mcrit/SpawningWorker.py
Lines 90 to 105 in 90d3344
it should be taken into account that the singlejobworker subprocess might crash. We've noticed cases where it does crash, and in those cases what happens is that
result_id = None
is returned, andFinished Remote Job with resuld_id: None
is logged. This causes the job to appear as finished in the web, but trying to access the result will result in an error page saying the result doesn't exist.I think this can be easily handled with (plus maybe logic to mark job as failed):
One other thing I wanted to mention, is that logs from the child singlejobworker process aren't appearing in logs when for example doing
docker logs <worker-container-id>
. Since you already get stdout and stderr of the child process, I think it's just a matter of logging them from the spawningworker parent processOther than this the feature seems to be working great, and thank you for taking the time to implement it :)
The text was updated successfully, but these errors were encountered: