-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ducktape: handle SIGABORT in Redpanda #24983
base: dev
Are you sure you want to change the base?
ducktape: handle SIGABORT in Redpanda #24983
Conversation
Currently on raise_on_crash we detect if Redpanda has crashed, in order to surface that as the failure reason instead of something like timeout or connection refused which invariably occurs when Redpanda dies during a test. However we didn't include "Aborting on shard" in the crash regex, which what occurs when abort is triggered, which happening in various scenarios including OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this stuff different from the one in utils.py which already has this?
I guess that's used for "log search" which finds problematic lines in the log (e.g., errors), the thing I changed is for |
Right so we would did already detect it but just in a different path? What does raise_on_crash do differently? I guess it gets tagged more directly? |
CI test resultstest results on build#61399
|
The log search path OTOH is executed when the test succeeds, in order to possibly fail it if we find a bad log line. So the log search path having a large set of detected problems make sense: we don't want to replace a test failure with an "error detected in logs" in general, but we do want to do that for a crash. |
Yes, I think so, but in practice that almost never happens because when Redpanda crashes the test fails in some way first so the log search doesn't happen. So yes as far as I know this is about better tagging. |
Currently on raise_on_crash we detect if Redpanda has crashed, in order to surface that as the failure reason instead of something like timeout or connection refused which invariably occurs when Redpanda dies during a test.
However we didn't include "Aborting on shard" in the crash regex, which what occurs when abort is triggered, which happening in various scenarios including OOM.
Backports Required
Release Notes