-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
install-darwin: fix _nixbld uids for macOS sequoia #10919
Conversation
Is there any chance we’ll be able to automate this transition on upgrade? Getting the word out to every single user that they need to run a script seems painful. And if we’re going to get everyone to run a migration script, is there any chance we can move the builder group to GID 330 at the same time so it stops showing up in System Settings? I can try and test this in a VM someday soon. |
That's a Nix-nix question, I guess. I'm not personally aware of any mechanism for running a stateful script when people update Nix and don't see one on
I've wondered why people were mentioning the GID, since we haven't needed to change it previously. (I assumed people were just conflating the 30000 GID we set with the 301 I gather the detsys installer sets?) I'm less sure about migrating it. Inevitably we can change it, but I don't know if doing so will "break" the existing groups. (If it did, we might have to rework the script a little to just remove the users + group and re-add them all?) |
Is there a reason that every nix install needs the same UIDs for these users? Furthermore, is there a reason they need to be consecutive? Does the main nix builder code deal with the UIDs directly or look them up from the names? It appears from the user reported error in #10912 that it uses the names, in which case the installer would be free to pick whatever UIDs happen to be free on a given OS install when creating them… That would be a simpler solution for the install/upgrade script and it would work everywhere. Trying to pick a empty span is only ever guaranteed to work on a fresh OS install—there's always going to be somebody out there who's got a user in that range created by hand or by some other software. As far as upgrading goes, perhaps the script should be more nixos/puppet-like (declarative/idempotent) and create the users only if they don't exist. It would also be super convenient if the code that got the error in #10912 could call the upgrade script and fix it right there, but I don't know if it has the correct permissions at that point in time. It could at least mention your upgrade script or prompt the user to re-install. |
To the best of my understanding (as someone not very familiar with the codebase for Nix itself), the answer to all of these is: not really.
Indeed. But reworking this means modifying how we install users on all platforms, having to wrestle with edge cases that the current installer's process is too ~dumb to have to worry about (like what to do if the UIDs we want are taken by old nixbld users that may not match the While this refactor might save us from having to relocate to a new range a few years in the future, it would not fix the underlying problem here--this macOS update's installer currently clobbers our existing users to take the UIDs for its own daemons. This could of course happen to any UID we use on any update to any existing install on macOS--either they'll have to stop doing this without relocating our users, or Nix itself needs to get smart enough to detect the situation and recover from it or suggest remediation steps to the user.
Not sure if you mean this in the context of the migration script, or the installer. If the latter, I broadly agree--but full idempotence is tricky to reach and maintain, and a lot of idempotence-focused work can lead to minimal benefit if/when one or two things block full idempotence (i.e., we do the work and testing, but the installer will still break or bail somewhere and we still have to tell frustrated users to go manually uninstall and reinstall). I had thoughts and laid down some patterns for getting us here a few years back, but at this point I imagine this is more likely to come from working on the NixOS org's fork of the detsys installer (directly or by contributing to the upstream detsys installer itself). That said, I'll note that macOS eminent-domaining our UIDs and clobbering the users in the process is also causing trouble for their installer. (IIRC is breaks their ability to do an uninstall, for example.)
I agree, but I think that's a nix-nix question outside of the scope of this PR (and I imagine it would be better if that took the form of a more general user fixup routine instead of having to figure out how to suggest macos version-specific cleanup to only the right users). I'll also note that--unless Apple changes the updater to be a bit more polite--the cake is mostly-baked here. Even if someone opened a PR to support this in Nix today and there was a release cut by the end of the week, some fraction of Nix's macOS users will not be using that Nix release when they take the Sequoia update (whether that's a beta this summer or the official release this fall). |
I’m sorry that I didn’t yet get around to testing the migration; I will try to do so soon. @abathur How do you feel about trying to land the UID (and preferably GID) changes for new installs only – which has to be done regardless – and we can worry about migration when it becomes clearer if Sequoia is going to implement any kind of migration itself? |
No worries :)
I'm not opposed to changing the GID (whether here or in a separate PR to ensure both are easy to revert from GH without unrelated regressions), but I'm conservative about fiddling with these and do need some convincing:
If there is an issue pointing me that way is fine, but if not could you open one and document what you're seeing there (ideally w/ screenshots)? |
(Sonoma 14.5) I fiddled with a bunch of combinations along these lines and a bunch of different IDs but the results were pretty clear; as I mentioned in #10892 (comment) the threshold for groups seems to be 500 for whatever reason, but of course picking one that matches the UID we’ll always take up seems the most conservative choice and Apple do use GID 395–400 and 441 as of Sonoma. I don’t know if I can prove the negative re: the group ID potentially causing problems, but I can’t personally foresee any problems that we wouldn’t already get with UIDs. I’m open to trying to find the time to test stuff in a VM if you have some proposals for ways to test things, though. From reading the old discussion from when we moved the UIDs, I get the impression we were just too preoccupied with those more pressing issues to think about whether there might be any side‐effects of having a GID outside the system range too. |
To clarify, I don't mean that I won't PR the change without meeting that standard of proof--I think your comment here reasonably demonstrates the issue and shows that a lower UID addresses it. I just meant that one reason I'm treading cautiously is that I can't just ~transfer confidence from looking at the other installers frontrunning us on this for days/weeks/months with issues/PRs to document the problem+fix and a lack of subsequent reports I could take as supporting evidence. |
Oh yeah I totally understand the conservatism here, don’t worry. I just figure if we’re on Apple’s wild ride for the time being anyway we might as well improve the UX, especially if we do end up having to make everyone do a manual migration. I guess if I find the time to test the Sequoia migration I can make a group in the system range before the upgrade and see what happens to it? I think the DetSys installer has some kind of A/B testing roll‐out stuff. I don’t know how quickly we could get data on potential GID problems with that though. |
dbdbd95
to
0365ca7
Compare
scripts/install-darwin-multi-user.sh
Outdated
export NIX_FIRST_BUILD_UID="${NIX_FIRST_BUILD_UID:-331}" | ||
export NIX_BUILD_GROUP_ID="${NIX_BUILD_GROUP_ID:-331}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be silly, but – is there any chance of this group being associated with _nixbld1
specifically because of sharing the GID with it? If so, we could avoid that by picking 330 instead. I guess that per‐user groups sharing the IDs of their corresponding users is just a convention so hopefully nothing in the system is going to assume it and cause some kind of weird behaviour down the road, but all this system role ID stuff makes me superstitious enough to worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not, but that's definitely in the class of risk that makes me a bit cautious about changing it without a reason (but I think you've provided a sufficiently-good reason to try it).
Hopefully we'll have some degree of confirmation from people running this for at least a few days before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pick 330 instead just in case? That matches the staus quo on Linux (30000 ≠ 30001), the previous status quo on Darwin (30000 ≠ 301), and doesn’t seem to have any additional risks.
I’m overdue for reinstalling my crusty old Nix setup anyway, so I will probably try this out soon as long as I can get the Darwin sandbox actually fixed.
Reposting this inline since it’s hidden in the commit comments currently: Maybe we should make the group name |
e1776b0
to
4af73b4
Compare
Latest force-push is just a rebase to see if the installer jobs will run. |
Installer jobs did run. (For context, they were broken because GH has switched over to arm macs. The upshot is that, while the test-generated installers covered x86_64-darwin, they now cover aarch64-darwin instead. That's pretty great for our case here :) Installer tests in my fork succeeded; individuals can test with (note that test installers aren't generated for all platforms):
|
@tomberek This is a pr that should fix that issue. |
(FWIW: we learned more about Apple’s recommended UID range for role users, and Determinate Systems have adopted an approach based on that temporarily, so this probably needs a bit more research before we can commit fully to the approach. I’ve been somewhat negligent at getting around to doing VM testing but hope to get around to it soon.) |
I just set mine at 2000, does it have to be < 500? I know it'll pollute the UI. |
@randallb We don't know. If the underlying problem documented in the issue below still exists, you may get booted into recovery mode on macOS updates. We know moving to the role user range (200-400) fixed that problem at that time, and I am not personally aware of any user reports that explicitly attest to the absence of that problem on update with uids outside of this range. |
What if you do the same as linux is already doing is detect uuid/guids and find free uuid/guids that work? |
db37199
to
0d75be1
Compare
Starting in macOS 15 Sequoia, macOS daemon UIDs are encroaching on our default UIDs of 301-332. This commit relocates our range up to avoid clashing with the current UIDs of 301-304 and buy us a little time while still leaving headroom for people installing more than 32 users.
I believe this is ready, but the backport labels down to 2.18 should be applied first, as otherwise users tracking the Nix version used by 24.05 or other intermediate versions will get broken Sequoia upgrades. Automatic migration would be nice but should be handled another time. |
@tomberek I agree with merging as-is--no blockers from my perspective. (edit: aside from Emily's point above.) |
@abathur The migration script uses PrimaryGroupID 30001. Something doesn't seem right. This used to be 30000 and is being changed to 350. Is the idea that this marks those users as migrated, or is this an off-by-one that has been gracefully handled so far? |
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
I think you're right--it does look like I made a mistake. For the record (since this is actually about the migration script added in the other PR), we're discussing the last line of the function below:
I'll brain-dump a little: Our final decision was that the migration script should not affirmatively touch the group ID at all. An early version of the migration script did affirmatively change it to 350 (via Since the main reason to change the GID--making sure the nixbld group doesn't show up in Users & Groups--is something people with existing installs weren't complaining too loudly about, we decided it didn't make sense to fiddle with it during migration. It looks like I copied in the wrong number when I ~reversed the implementation to match that decision:
I'm not quite sure why it's working (but a few people did report it working during some basic testing), but I can think of at least two possible explanations:
Now that we're focused on it, though, I think the even-more-correct thing to do here is probably to look up the existing group ID (it looks like the installer uses something like Can't start on a PR atm, but I might be able to this evening if someone else hasn't already opened one by then. |
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
…0919 install-darwin: fix _nixbld uids for macOS sequoia (backport #10919)
Motivation
Starting in macOS 15 Sequoia, macOS daemon UIDs are encroaching on our default UIDs of 301-332. This commit relocates our range up to avoid clashing with the current UIDs of 301-304 and buy us a little time while still leaving headroom for people installing more than 32 users.
It also adopts GID 350 (same as first UID), since @emilazy pointed out that this will keep our build group from showing up in the Users & Groups interface. (See #10919 (comment))
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.