-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensuring the integrity of full snapshot before uploading it to the object store. #779
base: master
Are you sure you want to change the base?
Conversation
Performance tests have been performed with etcd of
Backups taken by backup-restore with their sizes on disk.
Then a big full-snapshot is triggered by backup-restore:
Then restoration is triggered:
|
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @ishan16696.
PR looks good, I just have 2 small suggestions. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it and things are working good.
LGTM
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion for better error handling, as discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit, but otherwise the PR looks in great condition.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ishan16696 for filling the gaps in my understanding in validation, and addressing all review comments!
@ishan16696 could you rebase your PR on master? This will fix the failing integration tests. |
Rename few variables.
9b55def
to
2244885
Compare
return nil, err | ||
} | ||
|
||
if _, err := io.Copy(db, snapshotData); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance Improvement:
io.Copy
uses a default32KB
buffer, whileio.CopyBuffer
withhashBufferSize
(4MB) reduces system calls significantly.- For example, copying a 1GB snapshot:
- With
io.Copy
: ~32,768 system calls (32KB buffer) - With
io.CopyBuffer
: ~256 system calls (4MB buffer)
- With
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point but I'm not sure will it be significant improvement or not as etcd usually run on powerful VMs and using small buffer (32KB
in io.copy) or big buffer (2MB or 4MB
in io. CopyBuffer) has it's pros and cons:
- small buffer will use less memory, and more no. of sys calls.
- big buffer will use more memory and less no. of sys calls.
what about buffer of size 1MB
for both which will reduce the sys calls as well as reduce the memory usage as well ? and we will use your benchmark test to analyse both case, wdyt ?
What this PR does / why we need it:
It has been observed that while doing the restoration from full snapshot sometimes backup-restore failed to restore the etcd due to sometimes full snapshots got corrupted or missing a hash.
This PR is try to minimise the occurrence of such scenarios by verifying the integrity of full snapshot before uploading it to the object store.
Which issue(s) this PR fixes:
Fixes #778
Special notes for your reviewer:
This is how it's done:
Release note: