-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libzutil: optimize zpool_read_label with AIO #11467
Conversation
30c1fe4
to
fdc4d6a
Compare
@@ -45,7 +45,8 @@ libzutil_la_LIBADD = \ | |||
|
|||
if BUILD_LINUX | |||
libzutil_la_LIBADD += \ | |||
$(abs_top_builddir)/lib/libefi/libefi.la | |||
$(abs_top_builddir)/lib/libefi/libefi.la \ | |||
-lrt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought. But rather than pulling in a new dependency on librt
which could cause us some new portability problems. Should we instead parallelize this using the thread pool library, libtpool
, which has already been added above? Or would the setup/teardown costs offset any real gains. In particular, I'm thinking about macos. @lundman am I needlessly concerned here?
On a related note, librt
already exists in my default Ubuntu initrd so I don't see any issue there. We shouldn't be making it any bigger for most Linux distributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On FreeBSD and I think OSX too, lio_listio will be more efficient than using a thread pool. On Linux it's probably about the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other possible concern with using AIO here is that there may be some niche filesystem, or maybe even block device drivers, on Linux which don't fully support it. There are so many it's impossible to keep track. It's possible this could break some existing setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I know of one. On FreeBSD, using file-backed vdevs with the files stored on NFS will fail. There's a sysctl that you can change to allow it, though. But nobody has complained about it since that change originally landed in FreeBSD in 2017.
I could easily add a serialized fallback path for the case of ENOSYS or EAGAIN. But my concern is that using the fallback path for all error types would double the amount of disk activity even for non-recoverable errors like EACCES. On Linux, I don't think there should be trouble with any particular file system, because POSIX AIO is implemented by glibc using threads. So as long as librt is available, it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Then I'm happy with this as along as we confirm we're not causing a real problem for macos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just shoot me a message if you need something tested on macOS or Windows ports. The librt is a bit of a pain, I've yet to solve it in a way I like :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be helpful if you could test this PR on OSX. If you don't want to run the full test suite, then just do "zpool import" with both file-backed and disk-backed pools. That should be sufficient. And there's a Windows port too now? Sheesh; I had no idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered no issues:
# ./scripts/cmd-macos.sh zpool import
pool: BOOM
id: 5628038252777279005
state: ONLINE
BOOM ONLINE
media-3628A8FE-7BD0-8B42-89B9-6D46D57607EF ONLINE
# ./scripts/cmd-macos.sh zpool import -d ~/
pool: test
id: 7967859940384804015
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
test ONLINE
/Users/lundman/disk.img ONLINE
It is interesting to note that userland still uses module/os/linux/zfs/vdev_file.c
on all platforms, not sure how I feel about that yet. But that isn't connected to the changes you made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still uses module/os/linux/zfs/vdev_file.c on all platforms
Yeah, that's odd. We should probably either make it clear it's common by moving it to module/zfs/vdev_file.c
, or decide it really not and have per-platform versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across it when working on Windows, where vdev_file.c (was) quite different (now in 2.0 port uses zfs_file_* API). I believe one other os/linux file is also taken from Linux, was it abd_os.c or arc_os.c hmm... At the time I couldn't get my own vdev_file.c to work in userland (didn't want to spend more than 5 mins shaving yaks at that time). It is clearly working at the moment, but could be some Linux specific changes in future could surprise someone. So a userland only vdev_file.c could work, or, I'll try again to switch to os/macos/vdev_file (and windows+Freebsd).
@asomers seems like the CI had a hiccup and didn't fully test this commit. Would you might rebasing it on master and force updating the PR so we can get a full test run. |
zpool_read_label doesn't need the full labels including uberblocks. It only needs the vdev_phys_t. This reduces by half the amount of data read to check for a label, speeding up "zpool import", "zpool labelclear", etc. Originally committed as https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc Obtained from: FreeBSD Sponsored by: Spectra Logic Corp, Axcient Signed-off-by: Alan Somers <[email protected]>
Read all labels in parallel instead of sequentially. Originally committed as https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8 Obtained from: FreeBSD Sponsored by: Spectra Logic, Axcient Signed-off-by: Alan Somers <[email protected]>
fdc4d6a
to
dabd35d
Compare
rebased. |
The lint failures aren't caused by this PR. It looks like they came in from master when I rebased. |
Indeed. It looks like they somehow snuck in with todays ABD change. I'll see about sorting it out in a different PR. |
@@ -45,7 +45,8 @@ libzutil_la_LIBADD = \ | |||
|
|||
if BUILD_LINUX | |||
libzutil_la_LIBADD += \ | |||
$(abs_top_builddir)/lib/libefi/libefi.la | |||
$(abs_top_builddir)/lib/libefi/libefi.la \ | |||
-lrt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered no issues:
# ./scripts/cmd-macos.sh zpool import
pool: BOOM
id: 5628038252777279005
state: ONLINE
BOOM ONLINE
media-3628A8FE-7BD0-8B42-89B9-6D46D57607EF ONLINE
# ./scripts/cmd-macos.sh zpool import -d ~/
pool: test
id: 7967859940384804015
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
test ONLINE
/Users/lundman/disk.img ONLINE
It is interesting to note that userland still uses module/os/linux/zfs/vdev_file.c
on all platforms, not sure how I feel about that yet. But that isn't connected to the changes you made.
Read all labels in parallel instead of sequentially. Originally committed as https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8 Obtained from: FreeBSD Sponsored by: Spectra Logic, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes #11467
zpool_read_label doesn't need the full labels including uberblocks. It only needs the vdev_phys_t. This reduces by half the amount of data read to check for a label, speeding up "zpool import", "zpool labelclear", etc. Originally committed as https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc Obtained from: FreeBSD Sponsored by: Spectra Logic Corp, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11467
Read all labels in parallel instead of sequentially. Originally committed as https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8 Obtained from: FreeBSD Sponsored by: Spectra Logic, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11467
zpool_read_label doesn't need the full labels including uberblocks. It only needs the vdev_phys_t. This reduces by half the amount of data read to check for a label, speeding up "zpool import", "zpool labelclear", etc. Originally committed as https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc Obtained from: FreeBSD Sponsored by: Spectra Logic Corp, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11467
Read all labels in parallel instead of sequentially. Originally committed as https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8 Obtained from: FreeBSD Sponsored by: Spectra Logic, Axcient Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alek Pinchuk <[email protected]> Signed-off-by: Alan Somers <[email protected]> Closes openzfs#11467
Motivation and Context
zpool_read_label is used by many commands, including
zpool import
. Its runtime is dominated by 4 serialized disk reads. It also reads more than twice as much data as it really needs, because it doesn't inspect the uberblocks.Description
This PR reduces the amount of data read and reads all four labels in parallel.
How Has This Been Tested?
A version of this change has been used in FreeBSD since 2017, and internally at Spectra Logic for a few years before that.
Along with a few other changes that I will submit separately, this reduced the
zpool import
time for a large HDD-backed zpool from 121s to 16s.Types of changes
Checklist:
Signed-off-by
.