-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handle of crashed processes #659
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6bf8fd0
to
76a8d20
Compare
Hello, While the patch improve the situation, the patch leave an issue that the status file of crashed process is not updated. The issue may be difficult to solve because how storage are currently handled. In my case where status file are stored as file, I may solve the issue, but in general cases the #354 will be the solution. Best regards |
DetachProcesing intended to replace the broken MultiProcessing. Detach process will create a sub-process that will be detached from it's parent process, i.e. the parent pid will be 1. That way prevent sub-process to stay in zombie mode, as the default MultiProcessing do. This mode is crafted for linux platform. To use it add the following configuration: [processing] mode=detachprocessing
In the current code the sub-process PID is set once to a wring value, this function allow the sub-process to update the PID to the actual value.
The dblog.cleanup_crashed_process implement an heuristic to detect crashed sub-process, i.e. process that did not terminated in the expected way. This may happen if user use kill with SIGKILL or the process get killed by OOM killer or the process exit by himself with exit(0) or some case that I do not have in mind. In this case the process entry in the database is not updated properly to reflect the fact that the process has finished. This process entry stack up and block new request to be processed. This function prevent this by checking if a process with the same pid still alive, if not process with the same pid is found we are sure that the process is not running anymore, thus we update is status as failed. Currently this function is called when the limit of parralle request is reach to check if some process are crashed.
76a8d20
to
9a80234
Compare
cehbrecht
approved these changes
Sep 5, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Improve the management of sub-process on linux platform.
As describe in #493 process may crash and the database keep them as running process. At some point the server does not accept new request because it reach the max parallel request limit. This patch series expect to handle this situation more properly on linux platform. The patch series must, as preliminary requirement fix an issue with the MultiProcessing module which can keep zombies processes forever, here an instance of what I can get with
ps -f -u apache
:I think the issue is not limited to apache. To fix the issue this patch series provide a new DetachProcessing mode that actually detach the processing of request, ensuring that the new process get become a child of pid 1. This ensure that processes will not end up as zombies. This DetachProcessing should be good for any use case on linux, i.e. other server than apache.
Thus to ensure that the patch series work the configuration must use the mode detachprocessing, other wise terminated process cannot be detected. The heuristic used in the patch series is basically to check if a process exist with the stored pid, if the pid is not here anymore, we are sure that the process is not running anymore. In case the pid still there, we are not sure, because linux may reuse the pid for another process. This is why this is a safe heuristic but not 100% accurate.
I tried several more accurate heuristic, but sometime they do not work and other time make things much more complex.
Moreover the patch check and cleanup sub-process only when pywps reach the max parralel process limit, this mean that process may be considered as not finished for a long time, i.e. until we reach the max parralel process limit. It will be better to check it at every status request also, but to implement this we require to address #354 .
Best regards
Related Issue / Discussion
This is related to #493
Additional Information
Contribution Agreement
(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)