Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch live rootfs from squashfs to EROFS #1852

Open
jlebon opened this issue Dec 17, 2024 · 50 comments
Open

Switch live rootfs from squashfs to EROFS #1852

jlebon opened this issue Dec 17, 2024 · 50 comments
Labels
area/bootable-containers Related to the bootable containers effort. format/live-iso format/live-pxe jira for syncing to jira kind/enhancement platform/metal priority/high status/pending-next-release Fixed upstream. Waiting on a next release.

Comments

@jlebon
Copy link
Member

jlebon commented Dec 17, 2024

Currently, we ship a rootfs.img CPIO (both as a separate artifact and as part of the live ISO) which contains the rootfs as a squashfs image.

Let's switch it over to use EROFS instead. Since EROFS is already in use by composefs, this reduces the number of read-only filesystem image formats we have to care about. RHEL 10 is planning to use EROFS for all its live media, and RHCOS is no exception. So a primary motivation here is to have FCOS and RHCOS aligned to do things in upstream first and reduce the maintenance burden. Though we should definitely in the process ensure that EROFS does not introduce major time or size regressions at build time or consumption time.

This work would happen in osbuild since that's where our live ISO is now being built.

@AdamWill
Copy link

AdamWill commented Dec 19, 2024

@jlebon asked me to drop a link to @Conan-Kudo 's work on Fedora Kiwi-built lives here, so see https://pagure.io/fedora-kiwi-descriptions/pull-request/105 . we do also already have an erofs image in the main compose, one of the FEX images for Asahi - https://pagure.io/fedora-kiwi-descriptions/blob/rawhide/f/teams/asahi.xml#_6 .

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@hsiangkao

This comment has been minimized.

@Conan-Kudo
Copy link

I'm planning on making a Self-Contained Change for this. Now that I'm satisfied, I can start writing up one.

@dustymabe
Copy link
Member

@Conan-Kudo Are you interested in some Co-Owners?

@Conan-Kudo
Copy link

Sure.

@Conan-Kudo
Copy link

Here's what I've got so far: https://fedoraproject.org/wiki/Changes/EROFSforLiveMedia

@Conan-Kudo
Copy link

Someone interested in being part of the FCOS side of this can add themselves as co-owners and add their own relevant bits to the Change document.

@Conan-Kudo
Copy link

FYI @supakeen to add osbuild relevant stuff to the change

@hsiangkao
Copy link

hsiangkao commented Jan 8, 2025

@hsiangkao Thanks! Those improvements look great!

Another thing is to mention that currently fsck.erofs --extract doesn't support fragment cache, so it will extract slowly (I suggest to use mount and cp for extraction for now), I will find time to complete it later. But if anyone is interested, it's much appreciated to help on this too. So many other things are working on my side.
Also -C# and -zlzma,dictsize=# will impact the memory usage, if the small sizes work almost the same, it'd be better to use smaller numbers.

@supakeen
Copy link

supakeen commented Jan 8, 2025

FYI @supakeen to add osbuild relevant stuff to the change

Thank you, relevant people are: @dustymabe @jlebon @ravanelli which are all in this thread.

If CoreOS wants to switch to EroFS. As far as I'm aware we do have all the bits and bobs available in osbuild to do so though perhaps some more options need to be piped through.

Perhaps @bcl is interested too since they're working on a similar change for RHEL 10: osbuild/images#1117

@jlebon jlebon added the meeting topics for meetings label Jan 8, 2025
@bcl
Copy link

bcl commented Jan 8, 2025

If CoreOS wants to switch to EroFS. As far as I'm aware we do have all the bits and bobs available in osbuild to do so though perhaps some more options need to be piped through.

osbuild should be all ready to go with the release of the next release (v138), images still needs osbuild/images@870e45f from osbuild/images#1117 but that could be split out from the RHEL10 changes (I don't want to switch RHEL10 until the boot.iso has been built with it for a bit).

@yasminvalim
Copy link
Contributor

FYI: During the FCOS community meeting, @jlebon brought up this issue for our awareness, and we discussed the proposal and the benefits of this change. You can know more about it in the meeting logs.

@Conan-Kudo
Copy link

@hsiangkao Thanks! Those improvements look great!

Another thing is to mention that currently fsck.erofs --extract doesn't support fragment cache, so it will extract slowly (I suggest to use mount and cp for extraction for now), I will find time to complete it later. But if anyone is interested, it's much appreciated to help on this too. So many other things are working on my side. Also -C# and -zlzma,dictsize=# will impact the memory usage, if the small sizes work almost the same, it'd be better to use smaller numbers.

I can hold off on adjusting Calamares to use the erofs extract method until after you've implemented things to speed it up.

@jlebon
Copy link
Member Author

jlebon commented Jan 8, 2025

Was playing around with this using our FCOS live rootfs content. Some results:

Command mksquashfs mkfs.erofs mkfs.erofs -Eall-fragments,fragdedupe=inode -C 1048576
Time 22s 1m30s 1m6s
Size 939M 1.1G 977M
Extraction Time 3.5s 14s 6m

So with the new options, the sizes are definitely comparable. It's still about 5 times slower than mksquashfs but in absolute terms 1m is not at all something I'm worried about for our use case.

Some additional notes:

  • The mkfs.erofs commands above used -zlzma,level=6. I tried to use -zzstd,level=15 to make it equivalent to what we were using with squashfs for comparison, but it seems like zstd support is experimental currently (mkfs.erofs outputs a warning) and it ends up taking longer and being larger anyway.
  • As mentioned above by @hsiangkao, extracting the EROFS image using the new options is incredibly slow as you can see in the table. This mostly doesn't matter, except that...
  • ... unlike unsquashfs, there is no way currently to extract just a single file from the image. Some users of our root squashfs currently need this in an unprivileged context where they can't mount. They could extract the whole image for now, but that's a lot more expensive if we want to use -Eall-fragments (and even if that becomes faster, they'd still need to pay for the storage cost).
  • Looks like there's a bug in mkfs.erofs. It hard crashes instead of cleanly erroring out if there's a file for which it doesn't have permissions to read (e.g. /etc/gshadow has mode 000).

@jlebon jlebon removed the meeting topics for meetings label Jan 8, 2025
@hsiangkao
Copy link

Was playing around with this using our FCOS live rootfs content. Some results:

Command mksquashfs mkfs.erofs mkfs.erofs -Eall-fragments,fragdedupe=inode -C 1048576
Time 22s 1m30s 1m6s
Size 939M 1.1G 977M
Extraction Time 3.5s 14s 6m
So with the new options, the sizes are definitely comparable. It's still about 5 times slower than mksquashfs but in absolute terms 1m is not at all something I'm worried about for our use case.

Could you share a way with me to reproduce that too? I will look into that.

  • Looks like there's a bug in mkfs.erofs. It hard crashes instead of cleanly erroring out if there's a file for which it doesn't have permissions to read (e.g. /etc/gshadow has mode 000).

Will check, thanks for reporting.

@jlebon
Copy link
Member Author

jlebon commented Jan 10, 2025

Was playing around with this using our FCOS live rootfs content. Some results:
Command mksquashfs mkfs.erofs mkfs.erofs -Eall-fragments,fragdedupe=inode -C 1048576
Time 22s 1m30s 1m6s
Size 939M 1.1G 977M
Extraction Time 3.5s 14s 6m
So with the new options, the sizes are definitely comparable. It's still about 5 times slower than mksquashfs but in absolute terms 1m is not at all something I'm worried about for our use case.

Could you share a way with me to reproduce that too? I will look into that.

You can download the live rootfs from https://fedoraproject.org/coreos/download?stream=stable#download_section, unpack it using cpio -id, and then you'll find the root squashfs in there. I unsquashfs'ed it, and then played with that rootfs tree.

@hsiangkao
Copy link

hsiangkao commented Jan 21, 2025

Was playing around with this using our FCOS live rootfs content. Some results:
Command mksquashfs mkfs.erofs mkfs.erofs -Eall-fragments,fragdedupe=inode -C 1048576
Time 22s 1m30s 1m6s
Size 939M 1.1G 977M
Extraction Time 3.5s 14s 6m
So with the new options, the sizes are definitely comparable. It's still about 5 times slower than mksquashfs but in absolute terms 1m is not at all something I'm worried about for our use case.

Could you share a way with me to reproduce that too? I will look into that.

You can download the live rootfs from https://fedoraproject.org/coreos/download?stream=stable#download_section, unpack it using cpio -id, and then you'll find the root squashfs in there. I unsquashfs'ed it, and then played with that rootfs tree.

sigh.. I've fixed an issue which causes unexpected larger image sizes, and commit it to -expermental for testing.

Command mksquashfs -comp zstd mkfs.erofs -zzstd,15 mksquashfs -comp xz mkfs.erofs -zlzma,6
Size 732811264 747646976 701059072 687501312

mksquashfs command line: -comp xz -b 131072 -noappend or -comp xz -b 131072 -Xcompression-level 15 -noappend
mkfs.erofs command line: -C131072 -Eall-fragments,fragdedupe=inode

Anyway, just some update.

@jlebon
Copy link
Member Author

jlebon commented Jan 30, 2025

OK, redid a bunch of tests. This is with the latest erofs-utils work:

Command Time Size
mksquashfs -comp zstd (level 15) 22.5s 699M
mksquashfs -comp xz 44.4s 668M
mkfs.erofs -zlzma,6 -Eall-fragments,fragdedupe=inode -C131072 47.9s 656M
mkfs.erofs -zlzma,6 -Eall-fragments,fragdedupe=inode -C1048576 57.1s 629M

So mkfs.erofs still takes longer (though less time than in the first round of testing), but it's now consistently smaller.

I didn't consistently test extraction time since they're all pretty fast (roughly, unsquashfs is insanely fast, and fsck.erofs --extract is around 10s), so it does seem like the performance problem of extraction taking multiple minutes is now fixed. But anyway, that's also less important to me now that there is support to extract single files which was the original concern I had.

Also -C# and -zlzma,dictsize=# will impact the memory usage, if the small sizes work almost the same, it'd be better to use smaller numbers.

Do you mean memory usage during compression or memory usage when the filesystem is mounted?

@hsiangkao
Copy link

unsquashfs is insanely fast, and fsck.erofs --extract is around 10s

Just because unsquashfs supports multi-threaded extraction but I don't have enough time to work on this for now.

Do you mean memory usage during compression or memory usage when the filesystem is mounted?

Yes, it'd be better always use smaller configurations if possible.

@dustymabe dustymabe added the status/pending-next-release Fixed upstream. Waiting on a next release. label Feb 19, 2025
@Conan-Kudo
Copy link

Just for reference, this is also done for all kiwi-based images in Fedora too.

@hsiangkao

This comment has been minimized.

@Conan-Kudo

This comment has been minimized.

@supakeen
Copy link

supakeen commented Feb 20, 2025

Also done here: osbuild/images#1239 I'll tag a release for it soon. This affects the Fedora IoT Installer deliverable on Fedora's build infra and some of our internal image types. While not a live installer, it's nice to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootable-containers Related to the bootable containers effort. format/live-iso format/live-pxe jira for syncing to jira kind/enhancement platform/metal priority/high status/pending-next-release Fixed upstream. Waiting on a next release.
Projects
None yet
Development

No branches or pull requests

8 participants