-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16876 vos: discard invalid DTX when commit or abort - b26 #15901
base: release/2.6
Are you sure you want to change the base?
Conversation
Ticket title is 'LRZ: m02r01s07dao engine coredumps with vos EMRG src/vos/ilog.c:411 ilog_open() Assertion' |
b8498f7
to
864a9de
Compare
Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15901/1/display/redirect |
Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15901/2/display/redirect |
Test stage Unit Test on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15901/2/display/redirect |
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15901/2/display/redirect |
Test stage Unit Test with memcheck on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15901/2/display/redirect |
864a9de
to
d700671
Compare
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15901/5/testReport/ |
When commit or abort a DTX, we will check whether it is a valid entry or not. For invalid case, we will discard it with warning message and increase related metrics counter. It may be not perfect solution, but it is efficient to help the user to cleanup system efficiently. Signed-off-by: Jeff Olivier <[email protected]> Signed-off-by: Fan Yong <[email protected]>
d700671
to
649a287
Compare
if (rc == 0 && opc != ILOG_OP_UPDATE) { | ||
if (version == ilog_mag2ver(lctx->ic_root->lr_magic)) { | ||
D_WARN("ilog entry on %s doesn't exist\n", opc_str[opc]); | ||
return -DER_NONEXIST; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my understand, when we commit or abort the ilog via dtx_{commit,abort}, if commit/abort succeed, then the version will bump. Here, if the version does not bump, then must be not found. @jolivier23 , is it your expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. It means the entry is no longer in the ilog. This code is shared between abort, persist, and update (or insert) but only the former two need to remove the ilog entry from the dtx record
@@ -573,7 +574,7 @@ dtx_ilog_rec_release(struct umem_instance *umm, struct vos_container *cont, | |||
|
|||
ilog_close(loh); | |||
|
|||
if (rc != 0) | |||
if (rc != 0 && rc != -DER_NONEXIST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it too late to check DER_NONEXIST here? I suppose error is already returned when above ilog_open() failed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such log message is no matter, its caller do_dtx_rec_release()
will print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
Please help to review the patch that is required for 2.6.3-rc4, thanks! |
When commit or abort a DTX, we will check whether it is a valid entry or not. For invalid case, we will discard it with warning message and increase related metrics counter.
It may be not perfect solution, but it is efficient to help the user to cleanup system efficiently.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: