Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libzutil: optimize zpool_read_label with AIO #11467

Closed
wants to merge 2 commits into from

Conversation

asomers
Copy link
Contributor

@asomers asomers commented Jan 15, 2021

Motivation and Context

zpool_read_label is used by many commands, including zpool import. Its runtime is dominated by 4 serialized disk reads. It also reads more than twice as much data as it really needs, because it doesn't inspect the uberblocks.

Description

This PR reduces the amount of data read and reads all four labels in parallel.

How Has This Been Tested?

A version of this change has been used in FreeBSD since 2017, and internally at Spectra Logic for a few years before that.

Along with a few other changes that I will submit separately, this reduced the zpool import time for a large HDD-backed zpool from 121s to 16s.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@asomers asomers force-pushed the zpool_read_label_aio branch from 30c1fe4 to fdc4d6a Compare January 15, 2021 17:50
@ghost ghost added the Status: Code Review Needed Ready for review and testing label Jan 15, 2021
@@ -45,7 +45,8 @@ libzutil_la_LIBADD = \

if BUILD_LINUX
libzutil_la_LIBADD += \
$(abs_top_builddir)/lib/libefi/libefi.la
$(abs_top_builddir)/lib/libefi/libefi.la \
-lrt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. But rather than pulling in a new dependency on librt which could cause us some new portability problems. Should we instead parallelize this using the thread pool library, libtpool, which has already been added above? Or would the setup/teardown costs offset any real gains. In particular, I'm thinking about macos. @lundman am I needlessly concerned here?

On a related note, librt already exists in my default Ubuntu initrd so I don't see any issue there. We shouldn't be making it any bigger for most Linux distributions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On FreeBSD and I think OSX too, lio_listio will be more efficient than using a thread pool. On Linux it's probably about the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other possible concern with using AIO here is that there may be some niche filesystem, or maybe even block device drivers, on Linux which don't fully support it. There are so many it's impossible to keep track. It's possible this could break some existing setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I know of one. On FreeBSD, using file-backed vdevs with the files stored on NFS will fail. There's a sysctl that you can change to allow it, though. But nobody has complained about it since that change originally landed in FreeBSD in 2017.

I could easily add a serialized fallback path for the case of ENOSYS or EAGAIN. But my concern is that using the fallback path for all error types would double the amount of disk activity even for non-recoverable errors like EACCES. On Linux, I don't think there should be trouble with any particular file system, because POSIX AIO is implemented by glibc using threads. So as long as librt is available, it should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Then I'm happy with this as along as we confirm we're not causing a real problem for macos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just shoot me a message if you need something tested on macOS or Windows ports. The librt is a bit of a pain, I've yet to solve it in a way I like :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely be helpful if you could test this PR on OSX. If you don't want to run the full test suite, then just do "zpool import" with both file-backed and disk-backed pools. That should be sufficient. And there's a Windows port too now? Sheesh; I had no idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered no issues:

# ./scripts/cmd-macos.sh zpool import

   pool: BOOM
     id: 5628038252777279005
  state: ONLINE

	BOOM                                          ONLINE
	  media-3628A8FE-7BD0-8B42-89B9-6D46D57607EF  ONLINE

# ./scripts/cmd-macos.sh zpool import -d ~/
   pool: test
     id: 7967859940384804015
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	test                       ONLINE
	  /Users/lundman/disk.img  ONLINE

It is interesting to note that userland still uses module/os/linux/zfs/vdev_file.c on all platforms, not sure how I feel about that yet. But that isn't connected to the changes you made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still uses module/os/linux/zfs/vdev_file.c on all platforms

Yeah, that's odd. We should probably either make it clear it's common by moving it to module/zfs/vdev_file.c, or decide it really not and have per-platform versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came across it when working on Windows, where vdev_file.c (was) quite different (now in 2.0 port uses zfs_file_* API). I believe one other os/linux file is also taken from Linux, was it abd_os.c or arc_os.c hmm... At the time I couldn't get my own vdev_file.c to work in userland (didn't want to spend more than 5 mins shaving yaks at that time). It is clearly working at the moment, but could be some Linux specific changes in future could surprise someone. So a userland only vdev_file.c could work, or, I'll try again to switch to os/macos/vdev_file (and windows+Freebsd).

@behlendorf
Copy link
Contributor

behlendorf commented Jan 20, 2021

@asomers seems like the CI had a hiccup and didn't fully test this commit. Would you might rebasing it on master and force updating the PR so we can get a full test run.

zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from:	FreeBSD
Sponsored by:	Spectra Logic Corp, Axcient
Signed-off-by:	Alan Somers <[email protected]>
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from:	FreeBSD
Sponsored by:	Spectra Logic, Axcient
Signed-off-by:	Alan Somers <[email protected]>
@asomers asomers force-pushed the zpool_read_label_aio branch from fdc4d6a to dabd35d Compare January 20, 2021 20:34
@asomers
Copy link
Contributor Author

asomers commented Jan 20, 2021

rebased.

@asomers
Copy link
Contributor Author

asomers commented Jan 20, 2021

The lint failures aren't caused by this PR. It looks like they came in from master when I rebased.

@behlendorf
Copy link
Contributor

Indeed. It looks like they somehow snuck in with todays ABD change. I'll see about sorting it out in a different PR.

@@ -45,7 +45,8 @@ libzutil_la_LIBADD = \

if BUILD_LINUX
libzutil_la_LIBADD += \
$(abs_top_builddir)/lib/libefi/libefi.la
$(abs_top_builddir)/lib/libefi/libefi.la \
-lrt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered no issues:

# ./scripts/cmd-macos.sh zpool import

   pool: BOOM
     id: 5628038252777279005
  state: ONLINE

	BOOM                                          ONLINE
	  media-3628A8FE-7BD0-8B42-89B9-6D46D57607EF  ONLINE

# ./scripts/cmd-macos.sh zpool import -d ~/
   pool: test
     id: 7967859940384804015
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	test                       ONLINE
	  /Users/lundman/disk.img  ONLINE

It is interesting to note that userland still uses module/os/linux/zfs/vdev_file.c on all platforms, not sure how I feel about that yet. But that isn't connected to the changes you made.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 21, 2021
behlendorf pushed a commit that referenced this pull request Jan 21, 2021
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes #11467
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes openzfs#11467
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes openzfs#11467
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
zpool_read_label doesn't need the full labels including uberblocks.  It
only needs the vdev_phys_t.  This reduces by half the amount of data
read to check for a label, speeding up "zpool import", "zpool
labelclear", etc.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=63f8025d6acab1b334373ddd33f940a69b3b54cc

Obtained from: FreeBSD
Sponsored by: Spectra Logic Corp, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes openzfs#11467
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Read all labels in parallel instead of sequentially.

Originally committed as
https://cgit.freebsd.org/src/commit/?id=b49e9abcf44cafaf5cfad7029c9a6adbb28346e8

Obtained from: FreeBSD
Sponsored by: Spectra Logic, Axcient
Reviewed-by: Jorgen Lundman <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alek Pinchuk <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Closes openzfs#11467
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants