Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebsnvme-id creates broken sd* symlinks #37

Open
martinpitt opened this issue May 29, 2024 · 7 comments
Open

ebsnvme-id creates broken sd* symlinks #37

martinpitt opened this issue May 29, 2024 · 7 comments

Comments

@martinpitt
Copy link

martinpitt commented May 29, 2024

We spent quite some time debugging a storage test regression in Fedora rawhide which essentially breaks scsi_debug and other devices, but only on RedHat's/Fedora's Testing Farm infrastructure -- which is essentially AWS EC2 machines with an API.

Latest Fedora rawhide instances now have amazon-ec2-utils-2.2.0-2.fc41.noarch (which got introduced into Fedora very recently), which ships /usr/lib/udev/rules.d/70-ec2-nvme-devices.rules with

KERNEL=="nvme[0-9]*n[0-9]*",        ENV{DEVTYPE}=="disk",      ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c"
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c%n"

These instances have an NVME block device, and these rules cause the following symlinks to be created:

lrwxrwxrwx. 1 root root 7 May 29 03:52 /dev/sda1 -> nvme0n1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda11 -> nvme0n1p1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda12 -> nvme0n1p2
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda13 -> nvme0n1p3
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda14 -> nvme0n1p4

This is problematic in multiple ways:

  • Pretending that these are SCSI drives tramples on the kernel's namespace. udev symlinks should never create names which the kernel uses.
  • "nvme0n1" is the raw block device, not a partition. So it's very confusing to name it "sda1", it should be "sda". Likewise, the first partition should be "sda1", not "sda11".

If then a real sda comes along (e.g. with modprobe scsi_debug), this will create an actual /dev/sda, but then it's impossible to create/see partitions on that, as the sda1 etc. names are already taken.

This is most easily reproduced with

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1
sda1

Curiously, it also does that for a partition:

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1p2
sda1

that explains how the second udev rule can even work -- but this is really hackish!

My recommendation as former udev co-upstream is to just entirely remove these rules. They are not helpful, confusing, and break stuff. You can of course create symlinks in subdirs of /dev all you like, but please don't collide with kernel names.

Thanks!

@martinpitt
Copy link
Author

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt
Copy link
Author

Note: This only affects Fedora rawhide because Testing Farm Fedora 40 instances don't install amazon-ec2-utils by default. When I install it manually, the issue happens there as well.

martinpitt added a commit to martinpitt/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes cockpit-project#20520
@mvollmer
Copy link

@martinpitt, thanks for filing this! I have a hard time understanding what problem these symlinks are trying to solve. They only seem to create chaos.

If they are supposed to help with giving stable names to NVMe drives, I think that problem is already solved by ID_SERIAL, ID_WWN, and filesystem UUIDs.

@martinpitt
Copy link
Author

https://gitlab.com/testing-farm/infrastructure doesn't actually install that package -- I figure it's now part of the official Fedora rawhide AMIs?

martinpitt added a commit to cockpit-project/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
@major
Copy link

major commented May 29, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt That would be helpful. Thanks for detailing out the problems you found. I missed these during testing!

@martinpitt
Copy link
Author

@major OK, I filed https://bugzilla.redhat.com/show_bug.cgi?id=2284397 . Thanks!

@tbzatek
Copy link

tbzatek commented Sep 23, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

This has cost mine and @vojtechtrefny's an hour or two of our lives as well: https://bugzilla.redhat.com/show_bug.cgi?id=2313526

cowboyox pushed a commit to cowboyox/cockpit that referenced this issue Oct 8, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants