Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] Add planner support for starting new Crucible pantries #6836

Merged
merged 3 commits into from
Oct 14, 2024

Conversation

jgallagher
Copy link
Contributor

This is a much smaller change than the diff stat implies; most of the changes are expectorate outputs because the example system we set up for tests now includes Crucible pantry zones, which shifted a bunch of other zone UUIDs.

Fully supporting Crucible pantry replacement depends on #3763, which I'm continuing to work on. But the reconfigurator side of "start new pantries" is about as trivial as things go and does not depend on #3763, hence this PR.

@jgallagher
Copy link
Contributor Author

Testing on a4x2:

After handoff to Nexus, I confirmed there were three pantry zones running and present in the internal DNS records:

root@oxz_switch:/tmp# omdb db dns names internal 1
...
_crucible-pantry._tcp                              (records: 3)
      SRV  port 17000 61101e75-a656-4c17-b92c-3847ce940095.host.control-plane.oxide.internal
      SRV  port 17000 7768ec1d-a3e8-4ddd-9fc1-63fbc4c166c5.host.control-plane.oxide.internal
      SRV  port 17000 8db80ac4-3dc7-4cc9-87c9-72611ba1ee3c.host.control-plane.oxide.internal
...

I then used reconfigurator-cli to expunge one pantry:

 MODIFIED SLEDS:

  sled 5dbdf6e2-0966-4da5-9533-142f0eda49aa (active):

    physical disks at generation 1:
    -------------------------------------------------------------
    vendor             model                serial
    -------------------------------------------------------------
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_0
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_1
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_2
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_3
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_4


    omicron zones generation 5 -> 6:
    ---------------------------------------------------------------------------------------------
    zone type         zone id                                disposition    underlay IP
    ---------------------------------------------------------------------------------------------
    boundary_ntp      54276391-67d1-48bd-81a1-900a9b726408   in service     fd00:1122:3344:102::d
    cockroach_db      9a0a9f49-3aa4-4bf9-b783-cb16f1c4bd2f   in service     fd00:1122:3344:102::4
    cockroach_db      9bd8fa4b-70ed-495e-99b6-3a557f08a17a   in service     fd00:1122:3344:102::3
    crucible          42125d5e-25c3-489e-b32b-624b092be1ee   in service     fd00:1122:3344:102::b
    crucible          aa626888-bfeb-4040-aa76-eccb4487d4d1   in service     fd00:1122:3344:102::9
    crucible          cc52a30b-3842-4b7e-acf8-328cf163fe4d   in service     fd00:1122:3344:102::8
    crucible          d3636785-3acd-408e-ab40-07d9e6ce6cba   in service     fd00:1122:3344:102::a
    crucible          e4b957b2-872f-4a1e-ac5b-5a5375501736   in service     fd00:1122:3344:102::c
    internal_dns      148cfc7e-b661-47c5-b755-427624822642   in service     fd00:1122:3344:2::1
    nexus             ac30513d-622d-4a24-b4c0-97182f8b5c7f   in service     fd00:1122:3344:102::5
    oximeter          1df1264a-5826-4a3f-bc0e-dffcf3c515ab   in service     fd00:1122:3344:102::6
*   crucible_pantry   8db80ac4-3dc7-4cc9-87c9-72611ba1ee3c   - in service   fd00:1122:3344:102::7
     └─                                                      + expunged

I made this edited blueprint the target, then waited for that pantry zone to be expunged. It was removed from DNS, as expected:

root@oxz_switch:/tmp# omdb db dns show
GROUP    ZONE                         ver UPDATED              REASON
internal control-plane.oxide.internal 2   2024-10-11T18:53:31Z blueprint 148e7ef3-e553-4bac-bf98-4ebe6b71ff15 ()
external oxide.test                   2   2024-10-11T18:39:17Z create silo: "recovery"
root@oxz_switch:/tmp# omdb db dns names internal 2
...
  _crucible-pantry._tcp                              (records: 2)
      SRV  port 17000 61101e75-a656-4c17-b92c-3847ce940095.host.control-plane.oxide.internal
      SRV  port 17000 7768ec1d-a3e8-4ddd-9fc1-63fbc4c166c5.host.control-plane.oxide.internal

Running the planner placed a new pantry (on the same sled where I expunged the pantry, which is as expected: I only set up a4x2 with three sleds, and after expunging the existing pantry, we had 2 sleds with 1 pantry each and 1 sled with 0, so we placed the new pantry on the sled with 0):

root@oxz_switch:/tmp# omdb -w nexus blueprints regenerate
generated new blueprint 1162c765-e7c4-44ee-a1f5-f21e3a09a846
root@oxz_switch:/tmp# omdb -w nexus blueprints diff current 1162c765-e7c4-44ee-a1f5-f21e3a09a846
from: blueprint 148e7ef3-e553-4bac-bf98-4ebe6b71ff15
to:   blueprint 1162c765-e7c4-44ee-a1f5-f21e3a09a846

 UNCHANGED SLEDS:
   ... snip ...

 MODIFIED SLEDS:

  sled 5dbdf6e2-0966-4da5-9533-142f0eda49aa (active):

    physical disks at generation 1:
    -------------------------------------------------------------
    vendor             model                serial
    -------------------------------------------------------------
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_0
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_1
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_2
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_3
    synthetic-vendor   synthetic-model-U2   synthetic-serial-g1_4


    omicron zones generation 6 -> 7:
    ---------------------------------------------------------------------------------------------
    zone type         zone id                                disposition   underlay IP
    ---------------------------------------------------------------------------------------------
    boundary_ntp      54276391-67d1-48bd-81a1-900a9b726408   in service    fd00:1122:3344:102::d
    cockroach_db      9a0a9f49-3aa4-4bf9-b783-cb16f1c4bd2f   in service    fd00:1122:3344:102::4
    cockroach_db      9bd8fa4b-70ed-495e-99b6-3a557f08a17a   in service    fd00:1122:3344:102::3
    crucible          42125d5e-25c3-489e-b32b-624b092be1ee   in service    fd00:1122:3344:102::b
    crucible          aa626888-bfeb-4040-aa76-eccb4487d4d1   in service    fd00:1122:3344:102::9
    crucible          cc52a30b-3842-4b7e-acf8-328cf163fe4d   in service    fd00:1122:3344:102::8
    crucible          d3636785-3acd-408e-ab40-07d9e6ce6cba   in service    fd00:1122:3344:102::a
    crucible          e4b957b2-872f-4a1e-ac5b-5a5375501736   in service    fd00:1122:3344:102::c
    crucible_pantry   8db80ac4-3dc7-4cc9-87c9-72611ba1ee3c   expunged      fd00:1122:3344:102::7
    internal_dns      148cfc7e-b661-47c5-b755-427624822642   in service    fd00:1122:3344:2::1
    nexus             ac30513d-622d-4a24-b4c0-97182f8b5c7f   in service    fd00:1122:3344:102::5
    oximeter          1df1264a-5826-4a3f-bc0e-dffcf3c515ab   in service    fd00:1122:3344:102::6
+   crucible_pantry   5f432186-69d2-417c-93c8-f4ee7331baac   in service    fd00:1122:3344:102::21

After making this the target and waiting a bit, we were back to three pantries in DNS:

root@oxz_switch:/tmp# omdb db dns show
GROUP    ZONE                         ver UPDATED              REASON
internal control-plane.oxide.internal 3   2024-10-11T18:57:38Z blueprint 1162c765-e7c4-44ee-a1f5-f21e3a09a846 ()
external oxide.test                   2   2024-10-11T18:39:17Z create silo: "recovery"
root@oxz_switch:/tmp# omdb db dns names internal 3
...
_crucible-pantry._tcp                              (records: 3)
      SRV  port 17000 5f432186-69d2-417c-93c8-f4ee7331baac.host.control-plane.oxide.internal
      SRV  port 17000 61101e75-a656-4c17-b92c-3847ce940095.host.control-plane.oxide.internal
      SRV  port 17000 7768ec1d-a3e8-4ddd-9fc1-63fbc4c166c5.host.control-plane.oxide.internal

and the new zone was up and running:

root@g1:~# zoneadm list | grep 5f432186-69d2-417c-93c8-f4ee7331baac
oxz_crucible_pantry_5f432186-69d2-417c-93c8-f4ee7331baac
root@g1:~# zlogin oxz_crucible_pantry_5f432186-69d2-417c-93c8-f4ee7331baac
root@oxz_crucible_pantry_5f432186:~# svcs -a | grep pantry
online         18:57:28 svc:/oxide/crucible/pantry:default

and we can reach its status endpoint from another zone where internal DNS is set up amenably for curl:

root@oxz_ntp_8ee33c45:~# curl http://5f432186-69d2-417c-93c8-f4ee7331baac.host.control-plane.oxide.internal:17000/crucible/pantry/0
{"volumes":[],"num_job_handles":0}

Copy link
Contributor

@plotnick plotnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta love the easy ones! Looks just right to me.

Comment on lines -47 to -48
// Zones that we should place but don't yet.
| BlueprintZoneType::CruciblePantry(_)
Copy link
Contributor

@plotnick plotnick Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

A lot of work by a lot of people went into removing this comment. Nice job everyone!

@sunshowers
Copy link
Contributor

sunshowers commented Oct 12, 2024

which shifted a bunch of other zone UUIDs.

Ah interesting, the tabular output changed because crucible_pantry makes that column a bit wider now X_X

@jgallagher
Copy link
Contributor Author

which shifted a bunch of other zone UUIDs.

Ah interesting, the tabular output changed because crucible_pantry makes that column a bit wider now X_X

Haha, I didn't even realize this; convenient that "shifted" works for this and also the meaning I meant. Taking this sled change as an example:

-    ------------------------------------------------------------------------------------------
-    zone type      zone id                                disposition   underlay IP           
-    ------------------------------------------------------------------------------------------
-    clickhouse     44afce85-3377-4b20-a398-517c1579df4d   in service    fd00:1122:3344:103::23
-    crucible       38b047ea-e3de-4859-b8e0-70cac5871446   in service    fd00:1122:3344:103::2c
-    crucible       4644ea0c-0ec3-41be-a356-660308e1c3fc   in service    fd00:1122:3344:103::2b
-    crucible       55f4d117-0b9d-4256-a2c0-f46d3ed5fff9   in service    fd00:1122:3344:103::24
-    crucible       5c6a4628-8831-483b-995f-79b9126c4d04   in service    fd00:1122:3344:103::27
-    crucible       6a01210c-45ed-41a5-9230-8e05ecf5dd8f   in service    fd00:1122:3344:103::28
-    crucible       79552859-fbd3-43bb-a9d3-6baba25558f8   in service    fd00:1122:3344:103::25
-    crucible       90696819-9b53-485a-9c65-ca63602e843e   in service    fd00:1122:3344:103::26
-    crucible       c99525b3-3680-4df6-9214-2ee3e1020e8b   in service    fd00:1122:3344:103::29
-    crucible       f42959d3-9eef-4e3b-b404-6177ce3ec7a1   in service    fd00:1122:3344:103::2a
-    crucible       fb36b9dc-273a-4bc3-aaa9-19ee4d0ef552   in service    fd00:1122:3344:103::2d
-    internal_dns   7004cab9-dfc0-43ba-92d3-58d4ced66025   in service    fd00:1122:3344:1::1   
-    internal_ntp   c81c9d4a-36d7-4796-9151-f564d3735152   in service    fd00:1122:3344:103::21
-    nexus          b2573120-9c91-4ed7-8b4f-a7bfe8dbc807   in service    fd00:1122:3344:103::22
+    ---------------------------------------------------------------------------------------------
+    zone type         zone id                                disposition   underlay IP           
+    ---------------------------------------------------------------------------------------------
+    clickhouse        44afce85-3377-4b20-a398-517c1579df4d   in service    fd00:1122:3344:103::23
+    crucible          38b047ea-e3de-4859-b8e0-70cac5871446   in service    fd00:1122:3344:103::2c
+    crucible          4644ea0c-0ec3-41be-a356-660308e1c3fc   in service    fd00:1122:3344:103::2b
+    crucible          5c6a4628-8831-483b-995f-79b9126c4d04   in service    fd00:1122:3344:103::27
+    crucible          6a01210c-45ed-41a5-9230-8e05ecf5dd8f   in service    fd00:1122:3344:103::28
+    crucible          79552859-fbd3-43bb-a9d3-6baba25558f8   in service    fd00:1122:3344:103::25
+    crucible          90696819-9b53-485a-9c65-ca63602e843e   in service    fd00:1122:3344:103::26
+    crucible          a9a6a974-8953-4783-b815-da46884f2c02   in service    fd00:1122:3344:103::2e
+    crucible          c99525b3-3680-4df6-9214-2ee3e1020e8b   in service    fd00:1122:3344:103::29
+    crucible          f42959d3-9eef-4e3b-b404-6177ce3ec7a1   in service    fd00:1122:3344:103::2a
+    crucible          fb36b9dc-273a-4bc3-aaa9-19ee4d0ef552   in service    fd00:1122:3344:103::2d
+    crucible_pantry   55f4d117-0b9d-4256-a2c0-f46d3ed5fff9   in service    fd00:1122:3344:103::24
+    internal_dns      7004cab9-dfc0-43ba-92d3-58d4ced66025   in service    fd00:1122:3344:1::1   
+    internal_ntp      c81c9d4a-36d7-4796-9151-f564d3735152   in service    fd00:1122:3344:103::21
+    nexus             b2573120-9c91-4ed7-8b4f-a7bfe8dbc807   in service    fd00:1122:3344:103::22

The new crucible-pantry zone has ID 55f4d117-0b9d-4256-a2c0-f46d3ed5fff9 and underlay IP ...::24, which were the ID and IP for one of the crucible zones, and now there's a crucible zone with a new ID/IP (a9a6a974-8953-4783-b815-da46884f2c02 / ...::2e). So even if the columns hadn't shifted visually to the right, the IDs/IPs would have still shifted logically (some).

@jgallagher jgallagher merged commit 6fb91c6 into main Oct 14, 2024
16 checks passed
@jgallagher jgallagher deleted the john/plan-pantry branch October 14, 2024 13:47
jgallagher added a commit that referenced this pull request Oct 14, 2024
#6788 added expectorate tests of `reconfigurator-cli`, and #6836 changed
behavior that affected those tests but not in a way that conflicted at
the git level. This catches us up to both changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants