Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint execution stuck on sled-agent failed requests #7373

Closed
askfongjojo opened this issue Jan 17, 2025 · 13 comments
Closed

Blueprint execution stuck on sled-agent failed requests #7373

askfongjojo opened this issue Jan 17, 2025 · 13 comments

Comments

@askfongjojo
Copy link

askfongjojo commented Jan 17, 2025

In a racklet environment, I ran into blueprint_executor errors after expunging a sled (note: prior to this, I expunged some disks on a different sled but executed blueprint updates without issues).

root@oxz_switch0:~# omdb -w nexus sleds expunge 104dc345-936a-4652-97da-374ae06c30e5
...
WARNING: sled 104dc345-936a-4652-97da-374ae06c30e5 is PRESENT in the most recent inventory collection (spotted at 2025-01-17 19:13:34.490901 UTC). It is dangerous to expunge a sled that is still running. Are you sure you want to proceed anyway?
y/N〉y
WARNING: This operation will PERMANENTLY and IRRECOVABLY mark sled 104dc345-936a-4652-97da-374ae06c30e5 (BRM27230037) expunged. To proceed, type the sled's serial number.
sled serial number〉BRM27230037
expunged sled 104dc345-936a-4652-97da-374ae06c30e5 (previous policy: InService(Provisionable))

root@oxz_switch0:~# omdb -w nexus blueprints regenerate  
generated new blueprint 094f1432-d9b6-404a-b8e9-005f85c05143
root@oxz_switch0:~# omdb nexus blueprints diff current 094f1432-d9b6-404a-b8e9-005f85c05143
from: blueprint d9aa675b-5c44-486a-9490-2bb32de9908d
to:   blueprint 094f1432-d9b6-404a-b8e9-005f85c05143

 UNCHANGED SLEDS:

  sled 2525e504-c8e3-442c-9f3c-196353e22050 (active):

    physical disks at generation 2:
    -----------------------------------
    vendor   model             serial  
    -----------------------------------
    1b96     WUS4C6432DSP3X3   A079DDCD
    1b96     WUS4C6432DSP3X3   A079DF2B
    1b96     WUS4C6432DSP3X3   A079DF42
    1b96     WUS4C6432DSP3X3   A079DF5A
    1b96     WUS4C6432DSP3X3   A079DF5C
    1b96     WUS4C6432DSP3X3   A079DFFB
    1b96     WUS4C6432DSP3X3   A079E3AC
    1b96     WUS4C6432DSP3X3   A079E3D6
    1b96     WUS4C6432DSP3X3   A079E435
    1b96     WUS4C6432DSP3X3   A079E4A7


    datasets at generation 2:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    dataset name                                                                                                   dataset uuid                           quota     reservation   compression
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crypt/cockroachdb                                                     7d170ecd-7c2a-4f9f-87f3-2a9328ebae4c   none      none          off        
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crypt/cockroachdb                                                     a614909e-e380-4dd8-b2d3-c3d238a306a3   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crucible                                                              cf0d6084-2176-4546-881e-c8005aa5982a   none      none          off        
    oxp_1bf8d707-f1fe-40f7-b1a6-9361fb8db1e4/crucible                                                              340d032c-a341-4370-bf5c-c3c3f46fc920   none      none          off        
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crucible                                                              261f6e4c-b24c-493f-ae68-9d6130e0a609   none      none          off        
    oxp_7937ad82-58a2-45be-bb56-d6ace139f563/crucible                                                              61e7a398-772d-4b4b-a959-a2ab783e3e4c   none      none          off        
    oxp_9224b772-6b18-4e51-852d-a4bedd495130/crucible                                                              ce054cc2-4ddd-41df-a1bb-60074b8b9d16   none      none          off        
    oxp_9d5e6bd1-e7aa-4cd8-b89d-c409d1159dfd/crucible                                                              c6ce3953-9450-451b-a670-c7a853d69505   none      none          off        
    oxp_ba54186f-1f3b-4b91-9cb4-9dcd9cac763b/crucible                                                              58b2a1e2-5027-4fff-8d0b-5d3169e36079   none      none          off        
    oxp_bb7b9d13-d372-4546-9729-9dfea43dd71e/crucible                                                              2abb774c-b9d3-49ba-96a6-925dc102168d   none      none          off        
    oxp_cb5c0bd1-65fa-4b0e-9f82-0ea67d36f0ac/crucible                                                              ba11d065-7429-47b8-87a6-b025ff6e5e5b   none      none          off        
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crucible                                                              cb7ce2c3-ee07-4ea9-9c23-337aa4671592   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/external_dns                                                    0b62587b-bba5-4f32-82ce-6cb87e15ae1b   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/internal_dns                                                    d86ef576-741e-471c-9e03-d53288fa1a82   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/zone                                                            9b3af7b1-82f2-484b-9110-68d1ab9530d9   none      none          off        
    oxp_1bf8d707-f1fe-40f7-b1a6-9361fb8db1e4/crypt/zone                                                            51ad44ae-b019-4b5e-8c91-34b77defcebe   none      none          off        
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crypt/zone                                                            9435449a-6758-4fb4-b20a-a587760e5b43   none      none          off        
    oxp_7937ad82-58a2-45be-bb56-d6ace139f563/crypt/zone                                                            5b30116c-2adf-4c20-83de-a9fa9ce02878   none      none          off        
    oxp_9224b772-6b18-4e51-852d-a4bedd495130/crypt/zone                                                            cefefba2-9d83-46f0-97c1-0d65ed8183df   none      none          off        
    oxp_9d5e6bd1-e7aa-4cd8-b89d-c409d1159dfd/crypt/zone                                                            34c3a6cf-2e77-40c6-976f-3760c43337ed   none      none          off        
    oxp_ba54186f-1f3b-4b91-9cb4-9dcd9cac763b/crypt/zone                                                            ccf32ab1-828f-48be-b82c-c4f3950d9a50   none      none          off        
    oxp_bb7b9d13-d372-4546-9729-9dfea43dd71e/crypt/zone                                                            e1617453-9d01-43fc-9e66-71a929985ec1   none      none          off        
    oxp_cb5c0bd1-65fa-4b0e-9f82-0ea67d36f0ac/crypt/zone                                                            45aa3b77-c862-44b5-97ad-d920a879d010   none      none          off        
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crypt/zone                                                            4b18f6bd-d3fc-4105-8aeb-f685604f8991   none      none          off        
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crypt/zone/oxz_cockroachdb_739771f0-629b-4136-a084-8a50f5f82ea7       f04dcd9c-8539-4cf4-98ca-3508fe2c9ce3   none      none          off        
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crypt/zone/oxz_cockroachdb_e5a790a9-60da-4c40-acbf-a83c4099af5d       27795a4d-b40c-4c12-b3e6-24b6d029d310   none      none          off        
    oxp_cb5c0bd1-65fa-4b0e-9f82-0ea67d36f0ac/crypt/zone/oxz_crucible_1246fe04-1c41-4220-b806-c89306cb3cf2          0de2724d-ccf5-40d1-815a-1342d37cde40   none      none          off        
    oxp_9d5e6bd1-e7aa-4cd8-b89d-c409d1159dfd/crypt/zone/oxz_crucible_19a39daf-a586-423d-9698-777d95c312c1          171cec02-6704-485b-9d06-5922ed42a107   none      none          off        
    oxp_bb7b9d13-d372-4546-9729-9dfea43dd71e/crypt/zone/oxz_crucible_32b81dc6-d1a1-4d3b-b664-253ae196def4          80653640-1496-4ca0-a3b1-926c7b0e9902   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/zone/oxz_crucible_5652f74f-9666-41de-82dc-79c9f0d2750f          8d07a092-cee6-4e96-93de-519a95c7a170   none      none          off        
    oxp_9224b772-6b18-4e51-852d-a4bedd495130/crypt/zone/oxz_crucible_66cf1b7c-27ef-43c2-8df2-ec293ce5aed9          a20788a9-e722-41e1-803d-72047901c465   none      none          off        
    oxp_7937ad82-58a2-45be-bb56-d6ace139f563/crypt/zone/oxz_crucible_8314b50e-13db-49d2-a5b1-1e4450dba39a          1c80bb88-b756-4ad9-a685-c0f1a4d3aff8   none      none          off        
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crypt/zone/oxz_crucible_9dec176e-04ef-43ee-b9aa-b18a0b694e7c          9dcc73ca-ee0e-4ec8-9dcc-477418082b9f   none      none          off        
    oxp_ba54186f-1f3b-4b91-9cb4-9dcd9cac763b/crypt/zone/oxz_crucible_ef08b715-a8e3-4da9-9170-a7f2eff5de2b          4081ac10-152b-4de1-8314-ee833e153c13   none      none          off        
    oxp_1bf8d707-f1fe-40f7-b1a6-9361fb8db1e4/crypt/zone/oxz_crucible_fbe7407a-1407-4311-b0cb-8f780ce4afb2          68630cf3-a026-4cc0-9ddf-a0f44e1866bf   none      none          off        
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crypt/zone/oxz_crucible_fe17dd96-6fa0-40a4-bd50-0b23bf518864          cea819b8-89a9-4a53-b3b7-a8edd14526a5   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/zone/oxz_crucible_pantry_31b119f8-14db-492f-a56f-890956ed9a34   8fea3356-bfc0-443b-b150-774328bd1e9c   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/zone/oxz_external_dns_b3b6594d-4a4b-4f57-bf93-753a94d5bfde      59a5635d-8a9a-4c31-b15e-bb05f2838cc4   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/zone/oxz_internal_dns_3c111c23-672d-42e1-834f-977a09255229      fff6d995-4401-4a34-bbcc-6bf85d8c8ebd   none      none          off        
    oxp_cb5c0bd1-65fa-4b0e-9f82-0ea67d36f0ac/crypt/zone/oxz_nexus_781c3f21-fc5a-49b4-baf7-51a91a66d1eb             9896a956-7947-446b-9279-4dd963b73eb5   none      none          off        
    oxp_9d5e6bd1-e7aa-4cd8-b89d-c409d1159dfd/crypt/zone/oxz_ntp_580f94db-6e8f-4a33-8102-115bd227b104               b59c96b5-d7ae-4931-a4c5-161ee381238a   none      none          off        
    oxp_035cfe4a-3ca6-4f44-9e31-0c0aa6b9e3b9/crypt/debug                                                           62888f37-9d2c-4213-ac89-1b3f93ce5856   100 GiB   none          gzip-9     
    oxp_1bf8d707-f1fe-40f7-b1a6-9361fb8db1e4/crypt/debug                                                           58c4764d-ebd4-4abe-a868-c4bc623234d9   100 GiB   none          gzip-9     
    oxp_57f0f5a5-8997-414c-8bec-6e89cc9ff648/crypt/debug                                                           5e3c2920-4d87-4977-a4b4-7f8cdbb265c4   100 GiB   none          gzip-9     
    oxp_7937ad82-58a2-45be-bb56-d6ace139f563/crypt/debug                                                           e643a75b-de55-49ba-a447-f17b46db22ad   100 GiB   none          gzip-9     
    oxp_9224b772-6b18-4e51-852d-a4bedd495130/crypt/debug                                                           4c6fe586-63cf-4197-a65a-56c700a07794   100 GiB   none          gzip-9     
    oxp_9d5e6bd1-e7aa-4cd8-b89d-c409d1159dfd/crypt/debug                                                           4ea4b928-9a09-4156-a340-63a8f72d4e6b   100 GiB   none          gzip-9     
    oxp_ba54186f-1f3b-4b91-9cb4-9dcd9cac763b/crypt/debug                                                           34ba7296-a7e5-44e2-b18d-c6c45b3e2da0   100 GiB   none          gzip-9     
    oxp_bb7b9d13-d372-4546-9729-9dfea43dd71e/crypt/debug                                                           748239af-cf0f-4324-95a6-d10aadf5d5c8   100 GiB   none          gzip-9     
    oxp_cb5c0bd1-65fa-4b0e-9f82-0ea67d36f0ac/crypt/debug                                                           18699d81-3e50-4b09-a651-7db44e7b028e   100 GiB   none          gzip-9     
    oxp_d6737101-88a6-4c20-8816-a029596dd9b5/crypt/debug                                                           b0910a7c-9e53-4150-826a-1b686676783e   100 GiB   none          gzip-9     


    omicron zones at generation 6:
    ---------------------------------------------------------------------------------------------
    zone type         zone id                                disposition   underlay IP           
    ---------------------------------------------------------------------------------------------
    cockroach_db      739771f0-629b-4136-a084-8a50f5f82ea7   in service    fd00:1122:3344:104::3 
    cockroach_db      e5a790a9-60da-4c40-acbf-a83c4099af5d   in service    fd00:1122:3344:104::4 
    crucible          1246fe04-1c41-4220-b806-c89306cb3cf2   in service    fd00:1122:3344:104::d 
    crucible          19a39daf-a586-423d-9698-777d95c312c1   in service    fd00:1122:3344:104::10
    crucible          32b81dc6-d1a1-4d3b-b664-253ae196def4   in service    fd00:1122:3344:104::c 
    crucible          5652f74f-9666-41de-82dc-79c9f0d2750f   in service    fd00:1122:3344:104::e 
    crucible          66cf1b7c-27ef-43c2-8df2-ec293ce5aed9   in service    fd00:1122:3344:104::a 
    crucible          8314b50e-13db-49d2-a5b1-1e4450dba39a   in service    fd00:1122:3344:104::f 
    crucible          9dec176e-04ef-43ee-b9aa-b18a0b694e7c   in service    fd00:1122:3344:104::7 
    crucible          ef08b715-a8e3-4da9-9170-a7f2eff5de2b   in service    fd00:1122:3344:104::9 
    crucible          fbe7407a-1407-4311-b0cb-8f780ce4afb2   in service    fd00:1122:3344:104::b 
    crucible          fe17dd96-6fa0-40a4-bd50-0b23bf518864   in service    fd00:1122:3344:104::8 
    crucible_pantry   31b119f8-14db-492f-a56f-890956ed9a34   in service    fd00:1122:3344:104::6 
    external_dns      b3b6594d-4a4b-4f57-bf93-753a94d5bfde   in service    fd00:1122:3344:104::21
    internal_dns      3c111c23-672d-42e1-834f-977a09255229   in service    fd00:1122:3344:1::1   
    internal_ntp      580f94db-6e8f-4a33-8102-115bd227b104   in service    fd00:1122:3344:104::11
    nexus             781c3f21-fc5a-49b4-baf7-51a91a66d1eb   in service    fd00:1122:3344:104::5 


  sled 32049c15-d684-4738-9c64-9a79485cf88f (active):

    physical disks at generation 2:
    -----------------------------------
    vendor   model             serial  
    -----------------------------------
    1b96     WUS4C6432DSP3X3   A079DDFD
    1b96     WUS4C6432DSP3X3   A079DE08
    1b96     WUS4C6432DSP3X3   A079DE11
    1b96     WUS4C6432DSP3X3   A079DEAF
    1b96     WUS4C6432DSP3X3   A079DF11
    1b96     WUS4C6432DSP3X3   A079DFA7
    1b96     WUS4C6432DSP3X3   A079DFCA
    1b96     WUS4C6432DSP3X3   A079E02E
    1b96     WUS4C6432DSP3X3   A079E076


    datasets at generation 2:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    dataset name                                                                                                   dataset uuid                           quota     reservation   compression
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/cockroachdb                                                     c9fb08b1-2be9-48d9-8e74-082f94168d09   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/cockroachdb                                                     f04a013d-6f74-40ff-a492-fa83bc92725a   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crucible                                                              7ad5e17c-fa07-413a-91d0-ef953576bedd   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crucible                                                              6c45fdcb-2955-4742-8dc3-407230818198   none      none          off        
    oxp_78ce9615-e1cb-487f-b165-95164d1408d5/crucible                                                              ec84e22b-36d7-4642-9b14-694ea3d1527a   none      none          off        
    oxp_981b1696-1dd6-40e9-b5f2-63e167c2f195/crucible                                                              5d4fef3c-f53c-454a-b472-24e56b3b4974   none      none          off        
    oxp_9f0ef96f-3c61-4681-a887-571f5ccd83ca/crucible                                                              cf29042d-3e79-44fa-bf01-fb1da85ac248   none      none          off        
    oxp_cfc4fef3-8077-4f97-84d5-b14107ef3771/crucible                                                              01497876-81b1-4ce8-97af-ea8867ff41f2   none      none          off        
    oxp_d9ce6a4c-bfb8-42ec-960a-2c5ba662da40/crucible                                                              fba12abc-d73f-4479-993a-36a8799a5f71   none      none          off        
    oxp_e800c4cf-7206-442b-b65d-abc02b7108c0/crucible                                                              2b1a9c80-6dc8-4e00-991b-ee44eed3ec29   none      none          off        
    oxp_ff5761f9-e6b6-42de-918b-59934b4d51b7/crucible                                                              7b378d9d-aac2-421b-acf2-06b345dc5f40   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/internal_dns                                                    940db9ab-d78b-4d84-bf48-a0db5350c510   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/zone                                                            43fc073d-96fe-47a4-8ac7-7dba97d106b6   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/zone                                                            aff1ba1c-6c9c-49d2-a4af-314fcdde85a2   none      none          off        
    oxp_78ce9615-e1cb-487f-b165-95164d1408d5/crypt/zone                                                            e71c44af-d298-449c-93f1-b229a633afac   none      none          off        
    oxp_981b1696-1dd6-40e9-b5f2-63e167c2f195/crypt/zone                                                            5cf52b87-686d-4274-b802-37518ddaaf3c   none      none          off        
    oxp_9f0ef96f-3c61-4681-a887-571f5ccd83ca/crypt/zone                                                            2248b8e0-308d-4189-bf82-10fdecd1498d   none      none          off        
    oxp_cfc4fef3-8077-4f97-84d5-b14107ef3771/crypt/zone                                                            5fa5f43d-94be-44a5-ad80-4ef8ca7b5464   none      none          off        
    oxp_d9ce6a4c-bfb8-42ec-960a-2c5ba662da40/crypt/zone                                                            bdf3121e-6b22-4ab7-9651-fe4746f502a7   none      none          off        
    oxp_e800c4cf-7206-442b-b65d-abc02b7108c0/crypt/zone                                                            8d60ecc4-fe8c-410f-b310-edbe1f432c18   none      none          off        
    oxp_ff5761f9-e6b6-42de-918b-59934b4d51b7/crypt/zone                                                            1f6a9581-ff55-4f31-a37a-e74ae5dd9359   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/zone/oxz_cockroachdb_3cf5208e-25ec-4d65-b3ac-68671b17eb18       7d5b321d-214b-4e37-9d5f-72b53430c4d7   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/zone/oxz_cockroachdb_c6a00d80-6287-447f-9e11-ad40baf15378       51040978-e338-4a15-ae6f-8b4b972befd4   none      none          off        
    oxp_9f0ef96f-3c61-4681-a887-571f5ccd83ca/crypt/zone/oxz_crucible_0a31987f-8f50-4f14-94ed-8362fc44f4fd          6f2858b1-75e0-4e28-9484-df2be8398916   none      none          off        
    oxp_78ce9615-e1cb-487f-b165-95164d1408d5/crypt/zone/oxz_crucible_1b7902b2-db97-42f4-9498-19d37e3cf049          a1877139-1cbe-4ae0-a73e-bdfdd98906d9   none      none          off        
    oxp_ff5761f9-e6b6-42de-918b-59934b4d51b7/crypt/zone/oxz_crucible_31ca08ba-4215-4838-8077-669c25104b8c          2f25942b-f49f-4d2b-b007-a2a1797f148e   none      none          off        
    oxp_cfc4fef3-8077-4f97-84d5-b14107ef3771/crypt/zone/oxz_crucible_44d276ff-ffdc-4a3b-b662-8d64ad3b7bda          7fe3be8b-13d3-45dc-a11a-30352c8f843c   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/zone/oxz_crucible_612adb66-04aa-428e-a5cb-ed192d511c43          907aa306-e7f0-4f03-9d57-60a58d658130   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/zone/oxz_crucible_b31368f9-6725-4e45-834a-3ef6f8438056          bd3f8a5f-51f5-415f-84a4-82286cbd0694   none      none          off        
    oxp_981b1696-1dd6-40e9-b5f2-63e167c2f195/crypt/zone/oxz_crucible_cee75c02-a505-40f4-b80d-12e5fc9b1e12          7185a855-c5fb-4eab-980e-bcbcc3c70191   none      none          off        
    oxp_d9ce6a4c-bfb8-42ec-960a-2c5ba662da40/crypt/zone/oxz_crucible_ec307cc8-95bd-4f00-a2bb-20e9e76b6261          6ed24a7b-d7b2-4beb-8a1e-1166702ca2f3   none      none          off        
    oxp_e800c4cf-7206-442b-b65d-abc02b7108c0/crypt/zone/oxz_crucible_f1a5c895-0451-42dc-83e2-9cd396fa79d0          06f0296f-73cd-4d71-b841-9229ed1f6137   none      none          off        
    oxp_e800c4cf-7206-442b-b65d-abc02b7108c0/crypt/zone/oxz_crucible_pantry_422985a5-8237-439a-b4d7-dde69f6b65ca   a5ecda3f-1f25-4044-a7a7-b83a950cb1fc   none      none          off        
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/zone/oxz_internal_dns_e7501a9c-09ee-449a-a596-855cdb1aebc0      b7d869e1-fa16-402c-8d91-f974c08dc25e   none      none          off        
    oxp_cfc4fef3-8077-4f97-84d5-b14107ef3771/crypt/zone/oxz_nexus_41075b06-a9b3-4a05-8d15-1af00b73f8f6             f1332baa-4ea6-4af7-b0d1-ae5a3f21d3da   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/zone/oxz_ntp_9d147f42-cd82-463a-9a38-42ca5f4f5f2a               1cbbecd2-9dee-449f-9447-fb567e3dda88   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/zone/oxz_ntp_ba79bd52-d940-4ed9-a602-90b5af475662               00f3e595-dfd2-4889-986f-f0e874eac5a9   none      none          off        
    oxp_594ae870-a7b1-4353-ba60-9c253d915873/crypt/debug                                                           9ccebea8-6ec9-48db-9801-b79e684c958f   100 GiB   none          gzip-9     
    oxp_5e801c24-db06-4657-af84-1e2ac5f85e8b/crypt/debug                                                           b2a3ac9c-90fd-4e95-93df-eb5047832b03   100 GiB   none          gzip-9     
    oxp_78ce9615-e1cb-487f-b165-95164d1408d5/crypt/debug                                                           d086c8a3-f3c5-436f-a089-9e8237b25c02   100 GiB   none          gzip-9     
    oxp_981b1696-1dd6-40e9-b5f2-63e167c2f195/crypt/debug                                                           e0e0986f-8215-4f1c-b384-bc38179b3872   100 GiB   none          gzip-9     
    oxp_9f0ef96f-3c61-4681-a887-571f5ccd83ca/crypt/debug                                                           dba9a186-6334-410e-8b6d-9bbab062ba4e   100 GiB   none          gzip-9     
    oxp_cfc4fef3-8077-4f97-84d5-b14107ef3771/crypt/debug                                                           4ca793b0-9498-4cc7-9634-cece0aacaec5   100 GiB   none          gzip-9     
    oxp_d9ce6a4c-bfb8-42ec-960a-2c5ba662da40/crypt/debug                                                           7e3bc0f8-404e-404c-9a67-d66bc9ffacda   100 GiB   none          gzip-9     
    oxp_e800c4cf-7206-442b-b65d-abc02b7108c0/crypt/debug                                                           8e84be63-93bd-4b14-aeb5-2b2cc85763b0   100 GiB   none          gzip-9     
    oxp_ff5761f9-e6b6-42de-918b-59934b4d51b7/crypt/debug                                                           1d928192-a8f5-4b8c-b309-30ee9229d5ca   100 GiB   none          gzip-9     


    omicron zones at generation 6:
    ---------------------------------------------------------------------------------------------
    zone type         zone id                                disposition   underlay IP           
    ---------------------------------------------------------------------------------------------
    boundary_ntp      ba79bd52-d940-4ed9-a602-90b5af475662   in service    fd00:1122:3344:103::21
    cockroach_db      3cf5208e-25ec-4d65-b3ac-68671b17eb18   in service    fd00:1122:3344:103::3 
    cockroach_db      c6a00d80-6287-447f-9e11-ad40baf15378   in service    fd00:1122:3344:103::22
    crucible          0a31987f-8f50-4f14-94ed-8362fc44f4fd   in service    fd00:1122:3344:103::d 
    crucible          1b7902b2-db97-42f4-9498-19d37e3cf049   in service    fd00:1122:3344:103::9 
    crucible          31ca08ba-4215-4838-8077-669c25104b8c   in service    fd00:1122:3344:103::c 
    crucible          44d276ff-ffdc-4a3b-b662-8d64ad3b7bda   in service    fd00:1122:3344:103::a 
    crucible          612adb66-04aa-428e-a5cb-ed192d511c43   in service    fd00:1122:3344:103::e 
    crucible          b31368f9-6725-4e45-834a-3ef6f8438056   in service    fd00:1122:3344:103::6 
    crucible          cee75c02-a505-40f4-b80d-12e5fc9b1e12   in service    fd00:1122:3344:103::b 
    crucible          ec307cc8-95bd-4f00-a2bb-20e9e76b6261   in service    fd00:1122:3344:103::7 
    crucible          f1a5c895-0451-42dc-83e2-9cd396fa79d0   in service    fd00:1122:3344:103::8 
    crucible_pantry   422985a5-8237-439a-b4d7-dde69f6b65ca   in service    fd00:1122:3344:103::5 
    internal_dns      e7501a9c-09ee-449a-a596-855cdb1aebc0   in service    fd00:1122:3344:3::1   
    internal_ntp      9d147f42-cd82-463a-9a38-42ca5f4f5f2a   expunged      fd00:1122:3344:103::f 
    nexus             41075b06-a9b3-4a05-8d15-1af00b73f8f6   in service    fd00:1122:3344:103::4 


 MODIFIED SLEDS:

  sled 104dc345-936a-4652-97da-374ae06c30e5 (active -> decommissioned):

    physical disks from generation 2:
    -----------------------------------
    vendor   model             serial  
    -----------------------------------
-   1b96     WUS4C6432DSP3X3   A079DE84
-   1b96     WUS4C6432DSP3X3   A079DEE9
-   1b96     WUS4C6432DSP3X3   A079DF1E
-   1b96     WUS4C6432DSP3X3   A079DFBF
-   1b96     WUS4C6432DSP3X3   A079E184
-   1b96     WUS4C6432DSP3X3   A079E342
-   1b96     WUS4C6432DSP3X3   A079E35A
-   1b96     WUS4C6432DSP3X3   A079E3AE
-   1b96     WUS4C6432DSP3X3   A079E708
-   1b96     WUS4C6432DSP3X3   A084A7EA


    datasets from generation 4:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    dataset name                                                                                                   dataset uuid                           quota     reservation   compression
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/cockroachdb                                                     57c85ce1-f40a-4154-9479-bb27476abb5b   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crucible                                                              c00b2069-c916-4e63-afe6-84477aacdb5e   none      none          off        
-   oxp_1da50296-740e-489a-9ab6-b8d66168ab56/crucible                                                              b4177baa-be6b-4c3e-981d-1a8a7fc9e36d   none      none          off        
-   oxp_28f9edd4-d8c6-4def-b564-38aaf4d9c5b3/crucible                                                              c41c138b-f5f4-402b-ad21-c71e3dc29cca   none      none          off        
-   oxp_345ebb46-2ad8-4f00-82d0-af7018a8225b/crucible                                                              1f477c40-558d-4e20-98f9-6482f968e68c   none      none          off        
-   oxp_376e5f6d-ecd2-4fde-beae-5bc8a0b12ccd/crucible                                                              52b7b1b9-b20f-4bd1-b38c-0db16af0c0d5   none      none          off        
-   oxp_5509295b-487d-47eb-8c97-285ec05f4bd2/crucible                                                              6dd25aa6-3862-4907-b0d3-8529de0b2515   none      none          off        
-   oxp_8d1d9570-3a18-496f-b993-b7910dc0fe51/crucible                                                              f428eb2b-e267-4c25-be27-62d0ef0ad7de   none      none          off        
-   oxp_8f16283f-82ee-4360-b259-1ad890be2c31/crucible                                                              5b4c104c-e1f4-4ead-a2cc-b2019ff2c484   none      none          off        
-   oxp_95130050-24fc-42b0-a4b0-db7c0dcd3531/crucible                                                              23962dbe-798f-4a67-b129-3b9382f4b389   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crucible                                                              28783922-510e-41f4-a01e-3a4310c94a83   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/clickhouse                                                      a1f5c390-59f2-4c47-a305-059fd14a165c   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/internal_dns                                                    ac32cf8d-57f3-4b44-9e92-7315be5bbcd2   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/zone                                                            412dd06b-c314-4447-8f5f-92434ebcd5f2   none      none          off        
-   oxp_1da50296-740e-489a-9ab6-b8d66168ab56/crypt/zone                                                            33352668-fd7e-4926-a319-4567152172ff   none      none          off        
-   oxp_28f9edd4-d8c6-4def-b564-38aaf4d9c5b3/crypt/zone                                                            c9848939-b724-4fbb-964c-9faa6e688bc0   none      none          off        
-   oxp_345ebb46-2ad8-4f00-82d0-af7018a8225b/crypt/zone                                                            1abcf4a6-447f-45d7-8a3a-c6708fc74553   none      none          off        
-   oxp_376e5f6d-ecd2-4fde-beae-5bc8a0b12ccd/crypt/zone                                                            6a57e3cc-a7d7-48c7-9022-f380db79208b   none      none          off        
-   oxp_5509295b-487d-47eb-8c97-285ec05f4bd2/crypt/zone                                                            b8ebd9fa-c8b9-4969-a7ca-50bb003665cb   none      none          off        
-   oxp_8d1d9570-3a18-496f-b993-b7910dc0fe51/crypt/zone                                                            2fc5ed1c-89a8-48e9-943e-6d2146b3fad2   none      none          off        
-   oxp_8f16283f-82ee-4360-b259-1ad890be2c31/crypt/zone                                                            53ce80fe-13c3-415b-8747-afe026502dff   none      none          off        
-   oxp_95130050-24fc-42b0-a4b0-db7c0dcd3531/crypt/zone                                                            f2d0cbc4-d701-450c-9344-00ca1ac7dd24   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone                                                            f655ca02-98a2-4ff4-944e-7dbf9fea7e90   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone/oxz_clickhouse_6d2cbfa9-1017-4d47-8fe2-53d88e4b129c        1fa334dd-0a72-4415-ac2b-e31c844b2e28   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone/oxz_cockroachdb_9390454a-4c9d-4caf-9214-c455ad5284fd       5deb6a32-3df0-4295-8154-66476215e1bd   none      none          off        
-   oxp_1da50296-740e-489a-9ab6-b8d66168ab56/crypt/zone/oxz_crucible_04ab58ea-3222-4e9e-bf2b-86883fa5657e          fcb3642e-28d5-483a-bc69-a6fc8d037cf0   none      none          off        
-   oxp_345ebb46-2ad8-4f00-82d0-af7018a8225b/crypt/zone/oxz_crucible_21c51a86-1f16-4b3a-b5f2-dc1d115df109          0a982c64-4678-499c-9b25-96182a3e42f9   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone/oxz_crucible_37bf1507-60c8-40d6-9bde-f98de9d0eafe          e81000ac-8e12-4db1-abb3-6b05e388758e   none      none          off        
-   oxp_8d1d9570-3a18-496f-b993-b7910dc0fe51/crypt/zone/oxz_crucible_3a5b7fef-576e-4e80-8a59-5496637488aa          9fe7b6ff-d32f-474e-a2d6-e2c5493f366b   none      none          off        
-   oxp_376e5f6d-ecd2-4fde-beae-5bc8a0b12ccd/crypt/zone/oxz_crucible_5c532c03-d7cb-46b0-8d72-b35a35551554          7dd7c549-1271-45e9-a9c5-15013e692567   none      none          off        
-   oxp_5509295b-487d-47eb-8c97-285ec05f4bd2/crypt/zone/oxz_crucible_68801bb9-ef55-4aa7-bdd8-3c1379d43769          20683571-dec6-44b8-8581-2ae0cd0a0eaf   none      none          off        
-   oxp_28f9edd4-d8c6-4def-b564-38aaf4d9c5b3/crypt/zone/oxz_crucible_9c770ac8-257e-4ec2-a445-87d257894261          389c8dd2-224c-4912-b5dc-7dc4970bb668   none      none          off        
-   oxp_8f16283f-82ee-4360-b259-1ad890be2c31/crypt/zone/oxz_crucible_aa5b6217-e7a2-405b-af4a-83760398f62a          bf5339ea-fea3-4d13-8003-694ab8d49d08   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/zone/oxz_crucible_e29b1051-57c9-4b9b-b408-bc471a5c5740          8c4e8505-9bc4-4bdc-afcb-467ee37f2f23   none      none          off        
-   oxp_95130050-24fc-42b0-a4b0-db7c0dcd3531/crypt/zone/oxz_crucible_fa63cc96-5e4d-4ebd-9836-1985674b4a6b          244f7fad-6005-42ab-bb6c-19ac307411e9   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/zone/oxz_crucible_pantry_71795392-f4b2-4e09-9981-2baa44b6683c   c80ab0af-d6f4-42fe-b634-fb27645a1b56   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone/oxz_internal_dns_4e5c6461-e3e8-4ece-9597-2bd9c14f70ff      905c491e-4298-4576-97be-04ffab88634d   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/zone/oxz_nexus_3ae94428-fe08-411f-8a38-368ad7472848             5df9d204-5d4e-4e52-8dcd-0a4160532582   none      none          off        
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/zone/oxz_nexus_8c5cbf3e-5cd0-4247-92b4-e2b492be640f             30f3088b-651b-41fd-8bdb-fca436b79193   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/zone/oxz_ntp_05d47b8d-f063-40f0-8513-0a0df404c9b3               6479df0f-d03e-4635-b77a-778fe8d8434a   none      none          off        
-   oxp_034d725a-c364-43e7-871f-b4f23b9a2e93/crypt/debug                                                           cda90111-2756-4fd1-82a2-2026bc6224cc   100 GiB   none          gzip-9     
-   oxp_1da50296-740e-489a-9ab6-b8d66168ab56/crypt/debug                                                           6e8f8f57-05aa-4daa-b463-88d4c9a764d3   100 GiB   none          gzip-9     
-   oxp_28f9edd4-d8c6-4def-b564-38aaf4d9c5b3/crypt/debug                                                           8b705110-40a2-43c9-84b6-a57be4f0434a   100 GiB   none          gzip-9     
-   oxp_345ebb46-2ad8-4f00-82d0-af7018a8225b/crypt/debug                                                           c8637b61-9139-489b-a593-332d4935f7b7   100 GiB   none          gzip-9     
-   oxp_376e5f6d-ecd2-4fde-beae-5bc8a0b12ccd/crypt/debug                                                           acd5ef0e-4fe1-4365-ae26-dcf45ac844e6   100 GiB   none          gzip-9     
-   oxp_5509295b-487d-47eb-8c97-285ec05f4bd2/crypt/debug                                                           209a03b8-3ec5-4b2c-ba73-a85b59068439   100 GiB   none          gzip-9     
-   oxp_8d1d9570-3a18-496f-b993-b7910dc0fe51/crypt/debug                                                           f3ddf408-cb3a-409f-8371-38546644cd83   100 GiB   none          gzip-9     
-   oxp_8f16283f-82ee-4360-b259-1ad890be2c31/crypt/debug                                                           00315c64-d0e7-4f8e-a539-4c2a5aa1f1cf   100 GiB   none          gzip-9     
-   oxp_95130050-24fc-42b0-a4b0-db7c0dcd3531/crypt/debug                                                           ce6f84dd-ab75-4534-a8c8-de477e3056e1   100 GiB   none          gzip-9     
-   oxp_cac4a634-1567-4810-b984-d2458cfa13e1/crypt/debug                                                           482059cc-4b6c-464c-a409-f930b8946943   100 GiB   none          gzip-9     


    omicron zones generation 8 -> 9:
    ----------------------------------------------------------------------------------------------
    zone type         zone id                                disposition    underlay IP           
    ----------------------------------------------------------------------------------------------
    nexus             3ae94428-fe08-411f-8a38-368ad7472848   expunged       fd00:1122:3344:102::21
*   boundary_ntp      05d47b8d-f063-40f0-8513-0a0df404c9b3   - in service   fd00:1122:3344:102::10
     └─                                                      + expunged                           
*   clickhouse        6d2cbfa9-1017-4d47-8fe2-53d88e4b129c   - in service   fd00:1122:3344:102::5 
     └─                                                      + expunged                           
*   cockroach_db      9390454a-4c9d-4caf-9214-c455ad5284fd   - in service   fd00:1122:3344:102::3 
     └─                                                      + expunged                           
*   crucible          04ab58ea-3222-4e9e-bf2b-86883fa5657e   - in service   fd00:1122:3344:102::e 
     └─                                                      + expunged                           
*   crucible          21c51a86-1f16-4b3a-b5f2-dc1d115df109   - in service   fd00:1122:3344:102::9 
     └─                                                      + expunged                           
*   crucible          37bf1507-60c8-40d6-9bde-f98de9d0eafe   - in service   fd00:1122:3344:102::6 
     └─                                                      + expunged                           
*   crucible          3a5b7fef-576e-4e80-8a59-5496637488aa   - in service   fd00:1122:3344:102::b 
     └─                                                      + expunged                           
*   crucible          5c532c03-d7cb-46b0-8d72-b35a35551554   - in service   fd00:1122:3344:102::8 
     └─                                                      + expunged                           
*   crucible          68801bb9-ef55-4aa7-bdd8-3c1379d43769   - in service   fd00:1122:3344:102::c 
     └─                                                      + expunged                           
*   crucible          9c770ac8-257e-4ec2-a445-87d257894261   - in service   fd00:1122:3344:102::a 
     └─                                                      + expunged                           
*   crucible          aa5b6217-e7a2-405b-af4a-83760398f62a   - in service   fd00:1122:3344:102::7 
     └─                                                      + expunged                           
*   crucible          e29b1051-57c9-4b9b-b408-bc471a5c5740   - in service   fd00:1122:3344:102::f 
     └─                                                      + expunged                           
*   crucible          fa63cc96-5e4d-4ebd-9836-1985674b4a6b   - in service   fd00:1122:3344:102::d 
     └─                                                      + expunged                           
*   crucible_pantry   71795392-f4b2-4e09-9981-2baa44b6683c   - in service   fd00:1122:3344:102::22
     └─                                                      + expunged                           
*   internal_dns      4e5c6461-e3e8-4ece-9597-2bd9c14f70ff   - in service   fd00:1122:3344:2::1   
     └─                                                      + expunged                           
*   nexus             8c5cbf3e-5cd0-4247-92b4-e2b492be640f   - in service   fd00:1122:3344:102::4 
     └─                                                      + expunged                           


  sled 61b7c861-993a-4590-9ca7-f506c59ee0a0 (active):

    physical disks at generation 6:
    -----------------------------------
    vendor   model             serial  
    -----------------------------------
    1b96     WUS4C6432DSP3X3   A079DE55
    1b96     WUS4C6432DSP3X3   A079E567


    datasets generation 5 -> 6:
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    dataset name                                                                                                   dataset uuid                           quota     reservation   compression
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/cockroachdb                                                     8fb0d742-087f-4bbe-bc3e-3512a13ccba7   none      none          off        
    oxp_180ce178-96c5-404f-bf53-72fc0e224918/crucible                                                              8d91b747-5e19-4122-a5d9-7a56efbe9041   none      none          off        
    oxp_20f48049-0113-4ecb-9131-97b84407755e/crucible                                                              4d2d6b30-e4d7-437c-9ed2-223041780f26   none      none          off        
    oxp_3ce5fd63-3acf-4ab2-8d6a-94cef5726cb2/crucible                                                              6967b2c3-4648-4485-9015-a7ce6590a404   none      none          off        
    oxp_45c07047-0d6b-48c3-8683-abacb200c299/crucible                                                              d8da75c9-82fd-450c-a490-cce1c30d5e73   none      none          off        
    oxp_7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1/crucible                                                              bc3dc868-bf2a-423f-9ede-7572ecc31130   none      none          off        
    oxp_7a1dbf01-5ee3-4d9c-9d0e-afaa477274e6/crucible                                                              a779e896-fecb-49da-ac6d-2ea776ed5821   none      none          off        
    oxp_869d00da-0c89-46ef-962a-62311c388b5f/crucible                                                              9acb4aff-4f65-49f3-8937-b0043eab63ca   none      none          off        
    oxp_b13438d8-60f9-412b-99a1-1ac31bb5c8da/crucible                                                              6abd23f9-9d25-41c6-a963-03819fba8595   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crucible                                                              fda33f1f-962b-45c3-b9a5-049928174ff7   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/external_dns                                                    c2a6165f-a14f-453b-821b-078e2e307bf7   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/internal_dns                                                    2521151c-313b-47fb-af56-540388274dc3   none      none          off        
    oxp_180ce178-96c5-404f-bf53-72fc0e224918/crypt/zone                                                            ff18f016-91e7-4ea3-962e-cdf87141606c   none      none          off        
    oxp_20f48049-0113-4ecb-9131-97b84407755e/crypt/zone                                                            3cf3f295-65c2-4636-ba25-1621980d5b8e   none      none          off        
    oxp_3ce5fd63-3acf-4ab2-8d6a-94cef5726cb2/crypt/zone                                                            dd96cb85-5af7-477e-9012-c7c8e60e275b   none      none          off        
    oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone                                                            45c28036-b15a-4dd8-b857-5c902d2486ea   none      none          off        
    oxp_7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1/crypt/zone                                                            3c929f40-ab23-40e2-b26b-377426aadaf3   none      none          off        
    oxp_7a1dbf01-5ee3-4d9c-9d0e-afaa477274e6/crypt/zone                                                            d4d876fe-1782-4ca9-8ac2-791e82916c20   none      none          off        
    oxp_869d00da-0c89-46ef-962a-62311c388b5f/crypt/zone                                                            23c27ad8-1c93-440a-ad34-2ddc52a79ad5   none      none          off        
    oxp_b13438d8-60f9-412b-99a1-1ac31bb5c8da/crypt/zone                                                            d79e6361-8005-4001-bf7e-5ca24265e12a   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/zone                                                            2a3ce365-8c7a-49df-b8ca-2412f14324e0   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/zone/oxz_cockroachdb_7cdbdac2-b593-491b-8d49-be37b0d79cb0       f05e82ed-07d2-4b53-897d-8d1d4f61f217   none      none          off        
    oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_054eb2e1-f3c5-489e-8fa2-a8841cc7e314          d9189827-cae2-45ce-bcd6-5488040638bb   none      none          off        
    oxp_7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1/crypt/zone/oxz_crucible_0cdc46de-ac07-4030-ba67-877904f1a128          0d839610-c5cf-401f-bb77-4da6e4ac7d42   none      none          off        
    oxp_3ce5fd63-3acf-4ab2-8d6a-94cef5726cb2/crypt/zone/oxz_crucible_40d52af0-2f5e-48c5-8c5b-85abfd184715          7fbd067b-ecb0-4a0b-88f1-1da54a83880d   none      none          off        
    oxp_869d00da-0c89-46ef-962a-62311c388b5f/crypt/zone/oxz_crucible_85e38d2e-98e7-4f85-b2a9-b10f1aa9ae00          566ed7ee-da6c-43d9-b5e7-c1a1972ef4be   none      none          off        
    oxp_20f48049-0113-4ecb-9131-97b84407755e/crypt/zone/oxz_crucible_8d0b659c-20d1-4e2b-9f44-d09aad60d3b6          96df5f88-fe47-4c15-abc7-c8213071a932   none      none          off        
    oxp_180ce178-96c5-404f-bf53-72fc0e224918/crypt/zone/oxz_crucible_92754cec-c2b3-475a-9b3a-a2f3f61a435d          5dbce939-a6c0-4191-a468-873b8df3de0d   none      none          off        
    oxp_7a1dbf01-5ee3-4d9c-9d0e-afaa477274e6/crypt/zone/oxz_crucible_c835a985-cd6f-4377-8239-9ef2dd1dd3a4          c3a88255-ac56-4886-99a5-1afef18aaf9d   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/zone/oxz_crucible_cfb2dccf-5285-46ec-aa43-f45a7e7a9f00          d11f6d7d-dbe2-488d-8149-9d00e2f962b0   none      none          off        
    oxp_b13438d8-60f9-412b-99a1-1ac31bb5c8da/crypt/zone/oxz_crucible_d25bbfa4-09ea-47cd-99f9-b08ab0ccf27d          86beaadc-8f26-4079-8631-d90f039ada89   none      none          off        
    oxp_869d00da-0c89-46ef-962a-62311c388b5f/crypt/zone/oxz_crucible_pantry_27a6ffdf-1c83-4762-99dc-32b614168e2d   40d81802-5ecc-4f13-92ab-4a56175262f3   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/zone/oxz_external_dns_c6033445-ee85-4d99-96ba-d53252db7e5e      f737c9e1-9c6f-4629-94f7-49dbd31f2a4d   none      none          off        
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/zone/oxz_internal_dns_adc881b8-8e0b-4078-ac24-bc80dfee7187      0b3f3361-1cc5-4de3-8904-7efb11631879   none      none          off        
    oxp_180ce178-96c5-404f-bf53-72fc0e224918/crypt/zone/oxz_ntp_2a1f8dff-e5e6-4489-8bb1-48fd194f3d68               7cc13599-3634-4277-bebe-9fca2b6acc16   none      none          off        
    oxp_3ce5fd63-3acf-4ab2-8d6a-94cef5726cb2/crypt/zone/oxz_ntp_afba98bf-83d2-4ed4-8c5e-7ae805d36e66               9fb4f431-f94c-477d-bb0b-8d67b06a68df   none      none          off        
    oxp_20f48049-0113-4ecb-9131-97b84407755e/crypt/zone/oxz_ntp_fad31fff-b59e-4089-bd57-1963f0288e18               f806add2-250c-422a-b04e-aeb373325a66   none      none          off        
    oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_oximeter_2d621598-a72b-4524-8f94-7c3f5eec4aa6          892e1b0f-c9ec-40ba-8302-1ed3668fe7fb   none      none          off        
    oxp_b13438d8-60f9-412b-99a1-1ac31bb5c8da/crypt/zone/oxz_oximeter_da1ce34d-e1af-4964-9b6f-9161ea8313e2          6227594b-7e94-4b59-b22b-049f7bf88846   none      none          off        
    oxp_180ce178-96c5-404f-bf53-72fc0e224918/crypt/debug                                                           4579552d-8a0f-439b-8f3b-54e2f8bee4b6   100 GiB   none          gzip-9     
    oxp_20f48049-0113-4ecb-9131-97b84407755e/crypt/debug                                                           78755664-dd4e-44f9-a19d-e1ac818e1292   100 GiB   none          gzip-9     
    oxp_3ce5fd63-3acf-4ab2-8d6a-94cef5726cb2/crypt/debug                                                           b508261d-d042-4427-8f65-122599afed5c   100 GiB   none          gzip-9     
    oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/debug                                                           6fd465b4-5b7c-44c6-a18c-4262127af2c1   100 GiB   none          gzip-9     
    oxp_7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1/crypt/debug                                                           304d75d5-a4cb-4a6d-b2d1-a400a4fc66a2   100 GiB   none          gzip-9     
    oxp_7a1dbf01-5ee3-4d9c-9d0e-afaa477274e6/crypt/debug                                                           f7273617-1933-483f-bda8-67065f7bc4c3   100 GiB   none          gzip-9     
    oxp_869d00da-0c89-46ef-962a-62311c388b5f/crypt/debug                                                           d69eea36-d2f2-4c6d-a8c5-3091003c3535   100 GiB   none          gzip-9     
    oxp_b13438d8-60f9-412b-99a1-1ac31bb5c8da/crypt/debug                                                           cae97af8-8978-4304-b1f1-2f31e6934d7f   100 GiB   none          gzip-9     
    oxp_ef481864-f3e1-4add-a8f6-7e276b2cf9f9/crypt/debug                                                           46dffd55-9b61-451f-a4a1-414da409cd2f   100 GiB   none          gzip-9     
*   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_08c5f361-a075-4740-b92d-db42fd8956c3               bd41f955-725f-4346-bd60-35c941402445   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/cockroachdb                                                     1d4d86e7-146a-45b6-b359-ce6e0ce6cea0   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/clickhouse                                                      49e13d08-ef3d-45f7-a84f-e2d2fb09aa45   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/internal_dns                                                    de6228ae-25cd-4b04-8159-ce770ea836bd   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_clickhouse_653ad0f9-f5eb-4f48-978d-10b8af1083c6        d98504b3-f190-4ffb-bcf7-b1848dad919a   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_cockroachdb_e66711a0-d4e4-4cf5-8430-45ac4d6619ed       606c1fd6-0b76-4ec1-a9d2-5181b9af125f   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_pantry_27a1b2e2-d824-4fa4-8e49-47526f29b20c   7a82623c-14ec-4173-8894-72c709458844   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a      b8701be8-dfee-4d2f-bcab-69022901fabb   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_nexus_d89c91ef-0a16-45f3-86c3-6a098f4a93f9             a8b03eed-e7a2-4502-ba59-e8930ee74bf0   none      none          off        
+   oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_7dff3085-e1d7-4e42-ba61-d25542a04363               07d8821f-c7ed-4119-9e9c-04ffce83bb7e   none      none          off        


    omicron zones generation 9 -> 10:
    ----------------------------------------------------------------------------------------------
    zone type         zone id                                disposition    underlay IP           
    ----------------------------------------------------------------------------------------------
    boundary_ntp      fad31fff-b59e-4089-bd57-1963f0288e18   expunged       fd00:1122:3344:101::10
    cockroach_db      7cdbdac2-b593-491b-8d49-be37b0d79cb0   expunged       fd00:1122:3344:101::3 
    crucible          054eb2e1-f3c5-489e-8fa2-a8841cc7e314   in service     fd00:1122:3344:101::d 
    crucible          0cdc46de-ac07-4030-ba67-877904f1a128   in service     fd00:1122:3344:101::9 
    crucible          40d52af0-2f5e-48c5-8c5b-85abfd184715   expunged       fd00:1122:3344:101::a 
    crucible          85e38d2e-98e7-4f85-b2a9-b10f1aa9ae00   expunged       fd00:1122:3344:101::b 
    crucible          8d0b659c-20d1-4e2b-9f44-d09aad60d3b6   expunged       fd00:1122:3344:101::e 
    crucible          92754cec-c2b3-475a-9b3a-a2f3f61a435d   expunged       fd00:1122:3344:101::8 
    crucible          c835a985-cd6f-4377-8239-9ef2dd1dd3a4   expunged       fd00:1122:3344:101::f 
    crucible          cfb2dccf-5285-46ec-aa43-f45a7e7a9f00   expunged       fd00:1122:3344:101::7 
    crucible          d25bbfa4-09ea-47cd-99f9-b08ab0ccf27d   expunged       fd00:1122:3344:101::c 
    crucible_pantry   27a6ffdf-1c83-4762-99dc-32b614168e2d   expunged       fd00:1122:3344:101::6 
    external_dns      c6033445-ee85-4d99-96ba-d53252db7e5e   expunged       fd00:1122:3344:101::4 
    internal_dns      adc881b8-8e0b-4078-ac24-bc80dfee7187   expunged       fd00:1122:3344:1::1   
    internal_ntp      2a1f8dff-e5e6-4489-8bb1-48fd194f3d68   expunged       fd00:1122:3344:101::21
    internal_ntp      afba98bf-83d2-4ed4-8c5e-7ae805d36e66   expunged       fd00:1122:3344:101::22
    oximeter          2d621598-a72b-4524-8f94-7c3f5eec4aa6   in service     fd00:1122:3344:101::24
    oximeter          da1ce34d-e1af-4964-9b6f-9161ea8313e2   expunged       fd00:1122:3344:101::5 
*   internal_ntp      08c5f361-a075-4740-b92d-db42fd8956c3   - in service   fd00:1122:3344:101::23
     └─                                                      + expunged                           
+   boundary_ntp      7dff3085-e1d7-4e42-ba61-d25542a04363   in service     fd00:1122:3344:101::25
+   clickhouse        653ad0f9-f5eb-4f48-978d-10b8af1083c6   in service     fd00:1122:3344:101::26
+   cockroach_db      e66711a0-d4e4-4cf5-8430-45ac4d6619ed   in service     fd00:1122:3344:101::27
+   crucible_pantry   27a1b2e2-d824-4fa4-8e49-47526f29b20c   in service     fd00:1122:3344:101::28
+   internal_dns      38d8aeed-85d1-456f-8079-525242cb1e1a   in service     fd00:1122:3344:2::1   
+   nexus             d89c91ef-0a16-45f3-86c3-6a098f4a93f9   in service     fd00:1122:3344:101::29


 COCKROACHDB SETTINGS:
    state fingerprint:::::::::::::::::   d4d87aa2ad877a4cc2fddd0573952362739110de (unchanged)
    cluster.preserve_downgrade_option:   "22.1" (unchanged)

 METADATA:
*   internal DNS version:   6 -> 7
    external DNS version:   4 (unchanged)

Enabled the new blueprint and saw that the executor failed to complete the new blueprint:

root@oxz_switch0:~# omdb -w nexus blueprints target set 094f1432-d9b6-404a-b8e9-005f85c05143 enabled
set target blueprint to 094f1432-d9b6-404a-b8e9-005f85c05143

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_executor 
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
task: "blueprint_executor"
  configured period: every 1m
  currently executing: iter 1544, triggered by a periodic timer firing
    started at 2025-01-17T21:19:04.625Z, running for 1956ms
  last completed activation: iter 1543, triggered by a periodic timer firing
    started at 2025-01-17T21:18:01.995Z (64s ago) and ran for 62627ms
    target blueprint: 094f1432-d9b6-404a-b8e9-005f85c05143                                                                    
    execution:        enabled                                                                                                 
    status:           failed at: Deploy Omicron zones (step 6/17)                                                             
    error:            step failed: Deploy Omicron zones

(complete error message and other debug info captured in the following comments due to character-length limit for issue comment)

@askfongjojo
Copy link
Author

askfongjojo commented Jan 17, 2025

root@oxz_switch0:~# omdb nexus background-tasks show blueprint_executor 
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
task: "blueprint_executor"
  configured period: every 1m
  currently executing: iter 1544, triggered by a periodic timer firing
    started at 2025-01-17T21:19:04.625Z, running for 1956ms
  last completed activation: iter 1543, triggered by a periodic timer firing
    started at 2025-01-17T21:18:01.995Z (64s ago) and ran for 62627ms
    target blueprint: 094f1432-d9b6-404a-b8e9-005f85c05143                                                                    
    execution:        enabled                                                                                                 
    status:           failed at: Deploy Omicron zones (step 6/17)                                                             
    error:            step failed: Deploy Omicron zones
      caused by:      Failed to put OmicronZonesConfig {                                                                      
                          generation: Generation(                                                                             
                              10,                                                                                             
                          ),                                                                                                  
                          zones: [                                                                                            
                              OmicronZoneConfig {                                                                             
                                  id: 054eb2e1-f3c5-489e-8fa2-a8841cc7e314 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: Crucible {                                                                       
                                      address: [fd00:1122:3344:101::d]:32345,                                                 
                                      dataset: OmicronZoneDataset {                                                           
                                          pool_name: ZpoolName {                                                              
                                              id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                               
                                              kind: External,                                                                 
                                          },                                                                                  
                                      },                                                                                      
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 0cdc46de-ac07-4030-ba67-877904f1a128 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: Crucible {                                                                       
                                      address: [fd00:1122:3344:101::9]:32345,                                                 
                                      dataset: OmicronZoneDataset {                                                           
                                          pool_name: ZpoolName {                                                              
                                              id: 7388fbf2-f5a8-4bbd-9e9e-e6635dcf1dc1 (zpool),                               
                                              kind: External,                                                                 
                                          },                                                                                  
                                      },                                                                                      
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 27a1b2e2-d824-4fa4-8e49-47526f29b20c (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: CruciblePantry {                                                                 
                                      address: [fd00:1122:3344:101::28]:17000,                                                
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 2d621598-a72b-4524-8f94-7c3f5eec4aa6 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: Oximeter {                                                                       
                                      address: [fd00:1122:3344:101::24]:12223,                                                
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 38d8aeed-85d1-456f-8079-525242cb1e1a (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: InternalDns {                                                                    
                                      dataset: OmicronZoneDataset {                                                           
                                          pool_name: ZpoolName {                                                              
                                              id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                               
                                              kind: External,                                                                 
                                          },                                                                                  
                                      },                                                                                      
                                      http_address: [fd00:1122:3344:2::1]:5353,                                               
                                      dns_address: [fd00:1122:3344:2::1]:53,                                                  
                                      gz_address: fd00:1122:3344:2::2,                                                        
                                      gz_address_index: 0,                                                                    
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 653ad0f9-f5eb-4f48-978d-10b8af1083c6 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: Clickhouse {                                                                     
                                      address: [fd00:1122:3344:101::26]:8123,                                                 
                                      dataset: OmicronZoneDataset {                                                           
                                          pool_name: ZpoolName {                                                              
                                              id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                               
                                              kind: External,                                                                 
                                          },                                                                                  
                                      },                                                                                      
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: 7dff3085-e1d7-4e42-ba61-d25542a04363 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: BoundaryNtp {                                                                    
                                      address: [fd00:1122:3344:101::25]:123,                                                  
                                      ntp_servers: [                                                                          
                                          "ntp.eng.oxide.computer",                                                           
                                      ],                                                                                      
                                      dns_servers: [                                                                          
                                          1.1.1.1,                                                                            
                                          9.9.9.9,                                                                            
                                      ],                                                                                      
                                      domain: None,                                                                           
                                      nic: NetworkInterface {                                                                 
                                          id: d46ef962-0eb4-44c8-9a85-af5b07d5abf8,                                           
                                          kind: Service {                                                                     
                                              id: 7dff3085-e1d7-4e42-ba61-d25542a04363,                                       
                                          },                                                                                  
                                          name: Name(                                                                         
                                              "ntp-7dff3085-e1d7-4e42-ba61-d25542a04363",                                     
                                          ),                                                                                  
                                          ip: 172.30.3.6,                                                                     
                                          mac: MacAddr(                                                                       
                                              MacAddr6(                                                                       
                                                  [                                                                           
                                                      168,                                                                    
                                                      64,                                                                     
                                                      37,                                                                     
                                                      255,                                                                    
                                                      128,                                                                    
                                                      2,                                                                      
                                                  ],                                                                          
                                              ),                                                                              
                                          ),                                                                                  
                                          subnet: V4(                                                                         
                                              Ipv4Net {                                                                       
                                                  addr: 172.30.3.0,                                                           
                                                  width: 24,                                                                  
                                              },                                                                              
                                          ),                                                                                  
                                          vni: Vni(                                                                           
                                              100,                                                                            
                                          ),                                                                                  
                                          primary: true,                                                                      
                                          slot: 0,                                                                            
                                          transit_ips: [],                                                                    
                                      },                                                                                      
                                      snat_cfg: SourceNatConfig {                                                             
                                          ip: 172.20.29.5,                                                                    
                                          first_port: 16384,                                                                  
                                          last_port: 32767,                                                                   
                                      },                                                                                      
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: d89c91ef-0a16-45f3-86c3-6a098f4a93f9 (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: Nexus {                                                                          
                                      internal_address: [fd00:1122:3344:101::29]:12221,                                       
                                      external_ip: 172.20.29.2,                                                               
                                      nic: NetworkInterface {                                                                 
                                          id: 592fcdf0-b310-4b06-8da7-f04a549649b6,                                           
                                          kind: Service {                                                                     
                                              id: d89c91ef-0a16-45f3-86c3-6a098f4a93f9,                                       
                                          },                                                                                  
                                          name: Name(                                                                         
                                              "nexus-d89c91ef-0a16-45f3-86c3-6a098f4a93f9",                                   
                                          ),                                                                                  
                                          ip: 172.30.2.5,                                                                     
                                          mac: MacAddr(                                                                       
                                              MacAddr6(                                                                       
                                                  [                                                                           
                                                      168,                                                                    
                                                      64,                                                                     
                                                      37,                                                                     
                                                      255,                                                                    
                                                      128,                                                                    
                                                      3,                                                                      
                                                  ],                                                                          
                                              ),                                                                              
                                          ),                                                                                  
                                          subnet: V4(                                                                         
                                              Ipv4Net {                                                                       
                                                  addr: 172.30.2.0,                                                           
                                                  width: 24,                                                                  
                                              },                                                                              
                                          ),                                                                                  
                                          vni: Vni(                                                                           
                                              100,                                                                            
                                          ),                                                                                  
                                          primary: true,                                                                      
                                          slot: 0,                                                                            
                                          transit_ips: [],                                                                    
                                      },                                                                                      
                                      external_tls: true,                                                                     
                                      external_dns_servers: [                                                                 
                                          1.1.1.1,                                                                            
                                          9.9.9.9,                                                                            
                                      ],                                                                                      
                                  },                                                                                          
                              },                                                                                              
                              OmicronZoneConfig {                                                                             
                                  id: e66711a0-d4e4-4cf5-8430-45ac4d6619ed (service),                                         
                                  filesystem_pool: Some(                                                                      
                                      ZpoolName {                                                                             
                                          id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                                   
                                          kind: External,                                                                     
                                      },                                                                                      
                                  ),                                                                                          
                                  zone_type: CockroachDb {                                                                    
                                      address: [fd00:1122:3344:101::27]:32221,                                                
                                      dataset: OmicronZoneDataset {                                                           
                                          pool_name: ZpoolName {                                                              
                                              id: 45c07047-0d6b-48c3-8683-abacb200c299 (zpool),                               
                                              kind: External,                                                                 
                                          },                                                                                  
                                      },                                                                                      
                                  },                                                                                          
                              },                                                                                              
                          ],                                                                                                  
                      } to sled 61b7c861-993a-4590-9ca7-f506c59ee0a0                                                          
      caused by:      Communication Error: error sending request for url (http://[fd00:1122:3344:101::1]:12345/omicron-zones) 
      caused by:      error sending request for url (http://[fd00:1122:3344:101::1]:12345/omicron-zones)                      
      caused by:      operation timed out                                                                                     

@askfongjojo
Copy link
Author

Logged into the sled in question. It's a scrimlet and its sled-agent was up and running:

BRM42220026 # ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
igb0/ll           addrconf ok           fe80::eaea:6aff:fe09:86d2%igb0/10
cxgbe0/ll         addrconf ok           fe80::aa40:25ff:fe04:351%cxgbe0/10
cxgbe1/ll         addrconf ok           fe80::aa40:25ff:fe04:359%cxgbe1/10
bootstrap0/ll     addrconf ok           fe80::8:20ff:fefe:3c93%bootstrap0/10
bootstrap0/bootstrap6 static ok         fdb0:a840:2504:351::1/64
underlay0/ll      addrconf ok           fe80::8:20ff:fe01:880a%underlay0/10
underlay0/sled6   static   ok           fd00:1122:3344:101::1/64
underlay0/internaldns0 static ok        fd00:1122:3344:2::2/64

BRM42220026 # svcs sled-agent
STATE          STIME    FMRI
online         1986     svc:/oxide/sled-agent:default

BRM42220026 # zoneadm list
global
oxz_switch
oxz_crucible_054eb2e1-f3c5-489e-8fa2-a8841cc7e314
oxz_crucible_0cdc46de-ac07-4030-ba67-877904f1a128
oxz_oximeter_2d621598-a72b-4524-8f94-7c3f5eec4aa6
oxz_ntp_7dff3085-e1d7-4e42-ba61-d25542a04363
oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a
oxz_crucible_pantry_27a1b2e2-d824-4fa4-8e49-47526f29b20c
oxz_cockroachdb_e66711a0-d4e4-4cf5-8430-45ac4d6619ed
oxz_clickhouse_653ad0f9-f5eb-4f48-978d-10b8af1083c6

There are however many warnings in the log file related to DNS and client disconnection (the latter corresponding to the blueprint_executor error):

BRM42220026 # tail -f `svcs -L sled-agent` | looker -l warn
21:38:19.520Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/iliana/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.15.1/src/server.rs:801
    latency_us = 60006071
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:102::4]:42346
    req_id = 00be0751-7b5d-4585-b629-9b83f1a06bd6
    uri = /inventory
21:38:28.337Z WARN SledAgent (ServiceManager): Failed to look up switch zone locations
    error = Error resolving dendrite services in internal DNS: no record found for Query { name: Name("_dendrite._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }
    file = sled-agent/src/bootstrap/early_networking.rs:233
    retry_after = 13.50179891s
21:38:44.481Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/iliana/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.15.1/src/server.rs:801
    latency_us = 60003436
    local_addr = [fd00:1122:3344:101::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:57579
    req_id = 023f9480-41d8-422a-9ae8-71d1d76752ee
    uri = /omicron-zones
21:38:44.503Z WARN SledAgent (dropshot (SledAgent)): request handling cancelled (client disconnected)
    file = /home/iliana/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dropshot-0.15.1/src/server.rs:801
    latency_us = 60003372
    local_addr = [fd00:1122:3344:101::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:102::4]:55149
    req_id = 2e8c2693-372b-4df5-a859-0c728d2c41e5
    uri = /omicron-zones

At this point, there are 4 internal_dns zones. The one being expunged (4e5c6461) is still serving requests:

root@oxz_switch0:~# omdb nexus blueprints show current | grep internal_dns | grep -v oxp
note: Nexus URL not specified.  Will pick one from DNS.
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:1122:3344:103::4]:12221
    internal_dns      3c111c23-672d-42e1-834f-977a09255229   in service    fd00:1122:3344:1::1   
    internal_dns      e7501a9c-09ee-449a-a596-855cdb1aebc0   in service    fd00:1122:3344:3::1   
    internal_dns      38d8aeed-85d1-456f-8079-525242cb1e1a   in service    fd00:1122:3344:2::1   
    internal_dns      adc881b8-8e0b-4078-ac24-bc80dfee7187   expunged      fd00:1122:3344:1::1   
    internal_dns      4e5c6461-e3e8-4ece-9597-2bd9c14f70ff   expunged      fd00:1122:3344:2::1   

root@oxz_switch0:~# pilot host exec -c 'zoneadm list | grep internal_dns' 14-17
14  BRM42220026        ok: oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a
15  BRM27230037        ok: oxz_internal_dns_4e5c6461-e3e8-4ece-9597-2bd9c14f70ff
16  BRM23230018        ok: oxz_internal_dns_e7501a9c-09ee-449a-a596-855cdb1aebc0
17  BRM23230010        ok: oxz_internal_dns_3c111c23-672d-42e1-834f-977a09255229
root@oxz_switch0:~# pilot host login 15
BRM27230037 # zlogin oxz_internal_dns_4e5c6461-e3e8-4ece-9597-2bd9c14f70ff
[Connected to zone 'oxz_internal_dns_4e5c6461-e3e8-4ece-9597-2bd9c14f70ff' pts/3]
The illumos Project     helios-2.0.23114        January 2025
root@oxz_internal_dns_4e5c6461:~# netstat -an        

UDP: IPv4
   Local Address        Remote Address      State   
-------------------- -------------------- ----------
      *.*                                 Unbound
      *.68                                Idle
      *.546                               Idle

UDP: IPv6
   Local Address                     Remote Address                   State      If 
--------------------------------- --------------------------------- ---------- -----
      *.*                                                           Unbound    
fd00:1122:3344:2::1.53                                              Idle       
      *.546                                                         Idle       

TCP: IPv4
   Local Address        Remote Address    Swind  Send-Q Rwind  Recv-Q    State   
-------------------- -------------------- ------ ------ ------ ------ -----------
127.0.0.1.4999             *.*                 0      0 1000000      0 LISTEN

TCP: IPv6
   Local Address                     Remote Address                 Swind  Send-Q Rwind  Recv-Q    State      If 
--------------------------------- --------------------------------- ------ ------ ------ ------ ----------- -----
fd00:1122:3344:2::1.5353                *.*                              0      0 1000000      0 LISTEN      

Active UNIX domain sockets
Address          Type       Vnode            Conn             Local Address                           Remote Address                         
---------------- ---------- ---------------- ---------------- --------------------------------------- ---------------------------------------
fffffcfa49dc9b30 stream-ord 0000000          0000000                                                                                         
fffffcfa751b1048 stream-ord 0000000          0000000                                                                                         
fffffcfa47321038 dgram      fffffcfa35226200 0000000          /var/run/in.ndpd_mib                                                           
fffffcfa473213d8 stream-ord fffffcfa70e31a40 0000000          /var/run/in.ndpd_ipadm 
root@oxz_internal_dns_4e5c6461:~# tail -f `svcs -L internal_dns` | looker
22:15:16.004Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 20196,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 0,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("oxz_cockroachdb_c6a00d80-6287-447f-9e11-ad40baf15378."),
                ),
                original: Query {
                    name: Name("oxz_cockroachdb_c6a00d80-6287-447f-9e11-ad40baf15378."),
                    query_type: A,
                    query_class: IN,
                },
            },
            original: [
                52,
                111,
                120,
                122,
                95,
                99,
                111,
                99,
                107,
                114,
                111,
                97,
                99,
                104,
                100,
                98,
                95,
                99,
                54,
                97,
                48,
                48,
                100,
                56,
                48,
                45,
                54,
                50,
                56,
                55,
                45,
                52,
                52,
                55,
                102,
                45,
                57,
                101,
                49,
                49,
                45,
                97,
                100,
                52,
                48,
                98,
                97,
                102,
                49,
                53,
                51,
                55,
                56,
                0,
                0,
                1,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: None,
    } SERVFAIL: server is not authoritative for name: "oxz_cockroachdb_c6a00d80-6287-447f-9e11-ad40baf15378."
    peer_addr = [fd00:1122:3344:103::22]:57827
    req_id = 50a685b3-49cc-41f0-b0e4-6316ff863f2c
22:15:49.997Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 55325,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_clickhouse-admin-keeper._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_clickhouse-admin-keeper._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                24,
                95,
                99,
                108,
                105,
                99,
                107,
                104,
                111,
                117,
                115,
                101,
                45,
                97,
                100,
                109,
                105,
                110,
                45,
                107,
                101,
                101,
                112,
                101,
                114,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } NXDOMAIN: no records found for name: "_clickhouse-admin-keeper._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:103::4]:61976
    req_id = 2fbe75e1-803a-411f-ac1c-ebf5b033448d
22:15:57.250Z INFO dns-server (http): accepted connection
    local_addr = [fd00:1122:3344:2::1]:5353
    remote_addr = [fd00:1122:3344:103::4]:41611
22:15:57.487Z INFO dns-server (store): attempting generation update
    new_generation = 7
    req_id = 35c7d176-eb4b-467e-8773-dcf6df8cbe29
22:15:57.487Z INFO dns-server (store): updated generation
    new_generation = 7
    req_id = 35c7d176-eb4b-467e-8773-dcf6df8cbe29
22:15:57.487Z INFO dns-server (http): request completed
    latency_us = 222
    local_addr = [fd00:1122:3344:2::1]:5353
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:41611
    req_id = 35c7d176-eb4b-467e-8773-dcf6df8cbe29
    response_code = 204
    uri = /config
22:15:58.243Z INFO dns-server (http): accepted connection
    local_addr = [fd00:1122:3344:2::1]:5353
    remote_addr = [fd00:1122:3344:104::5]:34819
22:15:58.437Z INFO dns-server (store): attempting generation update
    new_generation = 7
    req_id = 73abae10-9b23-4126-9437-bef1d320510b
22:15:58.437Z INFO dns-server (store): updated generation
    new_generation = 7
    req_id = 73abae10-9b23-4126-9437-bef1d320510b
22:15:58.437Z INFO dns-server (http): request completed
    latency_us = 292
    local_addr = [fd00:1122:3344:2::1]:5353
    method = PUT
    remote_addr = [fd00:1122:3344:104::5]:34819
    req_id = 73abae10-9b23-4126-9437-bef1d320510b
    response_code = 204
    uri = /config
22:15:58.831Z INFO dns-server (http): accepted connection
    local_addr = [fd00:1122:3344:2::1]:5353
    remote_addr = [fd00:1122:3344:102::4]:45726
22:15:59.174Z INFO dns-server (store): attempting generation update
    new_generation = 7
    req_id = a2352a01-fb85-447e-83d5-114a656ccc5d
22:15:59.174Z INFO dns-server (store): updated generation
    new_generation = 7
    req_id = a2352a01-fb85-447e-83d5-114a656ccc5d

@askfongjojo
Copy link
Author

askfongjojo commented Jan 17, 2025

The newly provisioned internal_dns zone (not the one being expunged) appears to be the problematic DNS peer that's returning the DNS lookup errors

BRM42220026 # tail -10 /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/debug/oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a/oxide-internal_dns:default.log.1737142178 | looker
19:28:57.864Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 61724,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_dendrite._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_dendrite._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                9,
                95,
                100,
                101,
                110,
                100,
                114,
                105,
                116,
                101,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_dendrite._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:2::2]:60924
    req_id = 051832cd-2bfc-42f0-a34b-c570ffb69e66
19:29:05.426Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 26785,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_clickhouse-native._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_clickhouse-native._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                18,
                95,
                99,
                108,
                105,
                99,
                107,
                104,
                111,
                117,
                115,
                101,
                45,
                110,
                97,
                116,
                105,
                118,
                101,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_clickhouse-native._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::24]:60247
    req_id = 2bd20eaf-56a9-484d-b328-a039c88e06c4
19:29:15.453Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 12172,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_nexus._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_nexus._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                6,
                95,
                110,
                101,
                120,
                117,
                115,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_nexus._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::24]:61239
    req_id = 875656d0-198c-49e5-90c7-aa24373e29f2
19:29:19.098Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 42123,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_dendrite._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_dendrite._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                9,
                95,
                100,
                101,
                110,
                100,
                114,
                105,
                116,
                101,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_dendrite._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:2::2]:58276
    req_id = 3cb54483-a284-477f-8738-25d6916ac5f2
19:29:25.060Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 62279,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_nexus._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_nexus._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                6,
                95,
                110,
                101,
                120,
                117,
                115,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_nexus._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::2]:53608
    req_id = 734cd189-3a7c-4ef1-becf-a1d56306b9e8
19:29:25.061Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 64778,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("41075b06-a9b3-4a05-8d15-1af00b73f8f6.host.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("41075b06-a9b3-4a05-8d15-1af00b73f8f6.host.control-plane.oxide.internal."),
                    query_type: AAAA,
                    query_class: IN,
                },
            },
            original: [
                36,
                52,
                49,
                48,
                55,
                53,
                98,
                48,
                54,
                45,
                97,
                57,
                98,
                51,
                45,
                52,
                97,
                48,
                53,
                45,
                56,
                100,
                49,
                53,
                45,
                49,
                97,
                102,
                48,
                48,
                98,
                55,
                51,
                102,
                56,
                102,
                54,
                4,
                104,
                111,
                115,
                116,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                28,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "41075b06-a9b3-4a05-8d15-1af00b73f8f6.host.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::2]:64614
    req_id = 459db9e3-4684-461e-8814-4ba294aba328
19:29:25.061Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 19198,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("8c5cbf3e-5cd0-4247-92b4-e2b492be640f.host.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("8c5cbf3e-5cd0-4247-92b4-e2b492be640f.host.control-plane.oxide.internal."),
                    query_type: AAAA,
                    query_class: IN,
                },
            },
            original: [
                36,
                56,
                99,
                53,
                99,
                98,
                102,
                51,
                101,
                45,
                53,
                99,
                100,
                48,
                45,
                52,
                50,
                52,
                55,
                45,
                57,
                50,
                98,
                52,
                45,
                101,
                50,
                98,
                52,
                57,
                50,
                98,
                101,
                54,
                52,
                48,
                102,
                4,
                104,
                111,
                115,
                116,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                28,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "8c5cbf3e-5cd0-4247-92b4-e2b492be640f.host.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::2]:55077
    req_id = a4b265b7-e938-4657-af85-87d549c8c6bf
19:29:25.061Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 33248,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("781c3f21-fc5a-49b4-baf7-51a91a66d1eb.host.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("781c3f21-fc5a-49b4-baf7-51a91a66d1eb.host.control-plane.oxide.internal."),
                    query_type: AAAA,
                    query_class: IN,
                },
            },
            original: [
                36,
                55,
                56,
                49,
                99,
                51,
                102,
                50,
                49,
                45,
                102,
                99,
                53,
                97,
                45,
                52,
                57,
                98,
                52,
                45,
                98,
                97,
                102,
                55,
                45,
                53,
                49,
                97,
                57,
                49,
                97,
                54,
                54,
                100,
                49,
                101,
                98,
                4,
                104,
                111,
                115,
                116,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                28,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "781c3f21-fc5a-49b4-baf7-51a91a66d1eb.host.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:101::2]:49825
    req_id = 1ab90814-a3d2-4409-8793-cf18bd6cce60
19:29:33.604Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 59993,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_nexus._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_nexus._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                6,
                95,
                110,
                101,
                120,
                117,
                115,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_nexus._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:2::2]:50304
    req_id = 88d1a958-3702-49d6-b3d9-2d24d8b4e3ff
19:29:38.777Z ERRO dns-server (dns): failed to handle incoming DNS message: MessageRequest {
        header: Header {
            id: 7181,
            message_type: Query,
            op_code: Query,
            authoritative: false,
            truncation: false,
            recursion_desired: true,
            recursion_available: false,
            authentic_data: false,
            checking_disabled: false,
            response_code: NoError,
            query_count: 1,
            answer_count: 0,
            name_server_count: 0,
            additional_count: 1,
        },
        query: WireQuery {
            query: LowerQuery {
                name: LowerName(
                    Name("_dendrite._tcp.control-plane.oxide.internal."),
                ),
                original: Query {
                    name: Name("_dendrite._tcp.control-plane.oxide.internal."),
                    query_type: SRV,
                    query_class: IN,
                },
            },
            original: [
                9,
                95,
                100,
                101,
                110,
                100,
                114,
                105,
                116,
                101,
                4,
                95,
                116,
                99,
                112,
                13,
                99,
                111,
                110,
                116,
                114,
                111,
                108,
                45,
                112,
                108,
                97,
                110,
                101,
                5,
                111,
                120,
                105,
                100,
                101,
                8,
                105,
                110,
                116,
                101,
                114,
                110,
                97,
                108,
                0,
                0,
                33,
                0,
                1,
            ],
        },
        answers: [],
        name_servers: [],
        additionals: [],
        sig0: [],
        edns: Some(
            Edns {
                rcode_high: 0,
                version: 0,
                dnssec_ok: false,
                max_payload: 1232,
                options: OPT {
                    options: {},
                },
            },
        ),
    } SERVFAIL: server is not authoritative for name: "_dendrite._tcp.control-plane.oxide.internal."
    peer_addr = [fd00:1122:3344:2::2]:62392
    req_id = d1ae5698-edd0-43b1-a1c9-5a28d32352e9

The complete log file of the internal_dns at startup time:
oxide-internal_dns.log

@askfongjojo
Copy link
Author

askfongjojo commented Jan 18, 2025

The problem internal_dns zone oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a happened to be on the sled from which I expunged some disks previously (cubby 14, 61b7c861-993a-4590-9ca7-f506c59ee0a0).

root@oxz_switch0:~# omdb db sleds
 SERIAL       IP                             ROLE      POLICY      STATE   ID                                   
 BRM27230037  [fd00:1122:3344:102::1]:12345  -         expunged    active  104dc345-936a-4652-97da-374ae06c30e5 
 BRM23230010  [fd00:1122:3344:104::1]:12345  -         in service  active  2525e504-c8e3-442c-9f3c-196353e22050 
 BRM23230018  [fd00:1122:3344:103::1]:12345  scrimlet  in service  active  32049c15-d684-4738-9c64-9a79485cf88f 
 BRM42220026  [fd00:1122:3344:101::1]:12345  scrimlet  in service  active  61b7c861-993a-4590-9ca7-f506c59ee0a0 

root@oxz_switch0:~# pilot host ls
CUBBY IP                        SERIAL      IMAGE
14    fe80::aa40:25ff:fe04:351  BRM42220026 ci f0ef9fe/c2ef257 2025-01-16 18:05
15    fe80::aa40:25ff:fe04:614  BRM27230037 ci f0ef9fe/c2ef257 2025-01-16 18:05
16    fe80::aa40:25ff:fe04:6d3  BRM23230018 ci f0ef9fe/c2ef257 2025-01-16 18:05
17    fe80::aa40:25ff:fe04:851  BRM23230010 ci f0ef9fe/c2ef257 2025-01-16 18:05

The sled appears healthy generally and the disk space of the dataset being used for provisioning new zones seems adequate:

BRM42220026 # zoneadm list
global
oxz_switch
oxz_crucible_054eb2e1-f3c5-489e-8fa2-a8841cc7e314
oxz_crucible_0cdc46de-ac07-4030-ba67-877904f1a128
oxz_oximeter_2d621598-a72b-4524-8f94-7c3f5eec4aa6
oxz_ntp_7dff3085-e1d7-4e42-ba61-d25542a04363
oxz_crucible_pantry_27a1b2e2-d824-4fa4-8e49-47526f29b20c
oxz_cockroachdb_e66711a0-d4e4-4cf5-8430-45ac4d6619ed
oxz_clickhouse_653ad0f9-f5eb-4f48-978d-10b8af1083c6
oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a
BRM42220026 # svcs -xZ
BRM42220026 # 

BRM42220026 # zfs list -r oxp_45c07047-0d6b-48c3-8683-abacb200c299
NAME                                                                                                           USED  AVAIL     REFER  MOUNTPOINT
oxp_45c07047-0d6b-48c3-8683-abacb200c299                                                                      7.93G  2.81T       96K  /oxp_45c07047-0d6b-48c3-8683-abacb200c299
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crucible                                                             2.50G  2.81T      108K  /data
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crucible/regions                                                     2.50G  2.81T       96K  /data/regions
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crucible/regions/1a9dd200-6f7d-4550-8361-21af3b4530ed                1.13G  4.87G     1.13G  /data/regions/1a9dd200-6f7d-4550-8361-21af3b4530ed
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt                                                                5.42G  2.81T      248K  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/clickhouse                                                      295M  2.81T      295M  /data
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/cockroachdb                                                    1.47G  2.81T     1.47G  /data
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/debug                                                           339M  99.7G      339M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/debug
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/internal_dns                                                    220K  2.81T      220K  /data
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone                                                           3.33G  2.81T      480K  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_clickhouse_653ad0f9-f5eb-4f48-978d-10b8af1083c6       1.02G  2.81T     1.02G  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_clickhouse_653ad0f9-f5eb-4f48-978d-10b8af1083c6
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_cockroachdb_e66711a0-d4e4-4cf5-8430-45ac4d6619ed       676M  2.81T      676M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_cockroachdb_e66711a0-d4e4-4cf5-8430-45ac4d6619ed
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_054eb2e1-f3c5-489e-8fa2-a8841cc7e314          326M  2.81T      326M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_054eb2e1-f3c5-489e-8fa2-a8841cc7e314
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_pantry_27a1b2e2-d824-4fa4-8e49-47526f29b20c   308M  2.81T      308M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_crucible_pantry_27a1b2e2-d824-4fa4-8e49-47526f29b20c
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a      329M  2.81T      329M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_internal_dns_38d8aeed-85d1-456f-8079-525242cb1e1a
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_nexus_d89c91ef-0a16-45f3-86c3-6a098f4a93f9             200K  2.81T      200K  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_nexus_d89c91ef-0a16-45f3-86c3-6a098f4a93f9
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_08c5f361-a075-4740-b92d-db42fd8956c3               240K  2.81T      240K  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_08c5f361-a075-4740-b92d-db42fd8956c3
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_7dff3085-e1d7-4e42-ba61-d25542a04363               267M  2.81T      267M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_ntp_7dff3085-e1d7-4e42-ba61-d25542a04363
oxp_45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_oximeter_2d621598-a72b-4524-8f94-7c3f5eec4aa6          455M  2.81T      455M  /pool/ext/45c07047-0d6b-48c3-8683-abacb200c299/crypt/zone/oxz_oximeter_2d621598-a72b-4524-8f94-7c3f5eec4aa6

@askfongjojo
Copy link
Author

I've also copied to /staff/core/omicron-7373 the sled-agent log file for sled 14 (the one from which blueprint_executor got failed omicron zone PUT requests, also the same one hosting the problematic internal_dns).

@jgallagher
Copy link
Contributor

jgallagher commented Jan 21, 2025

Haven't dug deeply into this, but I think there are multiple things going on here, some of which we know and some of which are new:

  1. Does internal-dns start serving requests even before it has any records at all? That would explain why it's returning server is not authoritative for ... results. I vague remember discussing this in the past, I think, but am not finding an issue for it so may be misremembering.
  2. Why are internal DNS problems causing PUT /omicron-zones requests to time out?
  3. Blueprint execution shouldn't get stuck on any single failure (i.e., failing to PUT /omicron-zones shouldn't prevent the remaining execution steps from running). This is many Reconfigurator execution steps are fatal that shouldn't be #6999 and actively being worked on. This can't be helping and is the most visible error, but I'm not sure it's actually the source of these problems - internal DNS records should be sync'd be a different background task, and sled-agent timeouts shouldn't be happening even if reconfigurator is temporarily wedged.

@jgallagher
Copy link
Contributor

Digging into this one a little bit:

internal DNS records should be sync'd be a different background task

It does look like the internal DNS propagation background task is unhappy:

root@oxz_switch0:~# omdb nexus background-tasks show dns_propagation_internal
task: "dns_propagation_internal"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 6968, triggered by a periodic timer firing
    started at 2025-01-21T15:40:09.993Z (58s ago) and ran for 15067ms
    attempt to propagate generation: 7

      DNS_SERVER_ADDR            LAST_RESULT
      [fd00:1122:3344:1::1]:5353 error (see below)
      [fd00:1122:3344:2::1]:5353 success
      [fd00:1122:3344:3::1]:5353 success

    error: server [fd00:1122:3344:1::1]:5353: failed to propagate DNS generation 7 to server [fd00:1122:3344:1::1]:5353: Communication Error: error sending request for url (http://[fd00:1122:3344:1::1]:5353/config): error sending request for url (http://[fd00:1122:3344:1::1]:5353/config): operation timed out

From your blueprint above, fd00:1122:3344:1::1 did belong to a now-expunged internal DNS zone (adc881b8-8e0b-4078-ac24-bc80dfee7187) but should now be assigned to zone 3c111c23-672d-42e1-834f-977a09255229. FWIW, fd00:1122:3344:2::1 is also an address was assigned to a now-expunged zone but reassigned to a new zone.

@jgallagher
Copy link
Contributor

Trying to trace down why the DNS propagation RPW is failing: I think there's a sled-agent bug here (will open a separate issue once I'm more sure).

From sled 17 where the new internal DNS zone is running, we can get to [fd00:1122:3344:1::1]:5353 from both the gz and the internal DNS zone:

BRM23230010 # curl http://[fd00:1122:3344:1::1]:5353
{
  "request_id": "11716222-b3b8-4487-9ef3-493ec9fd48be",
  "message": "Not Found"
}
BRM23230010 # zlogin oxz_internal_dns_3c111c23-672d-42e1-834f-977a09255229 curl http://[fd00:1122:3344:1::1]:5353
{
  "request_id": "63e4793a-6b47-4d78-812c-13efc0ef045b",
  "message": "Not Found"
}

But those same requests appear to time out from the other three sleds' gzs. There is a Nexus zone running on sled 17, and if we specifically check it, we see that its DNS propagation task has succeeded:

root@oxz_switch0:~# OMDB_NEXUS_URL=http://[fd00:1122:3344:104::5]:12221 omdb nexus background-tasks show dns_propagation_internal
note: using Nexus URL http://[fd00:1122:3344:104::5]:12221
task: "dns_propagation_internal"
  configured period: every 1m
  currently executing: no
  last completed activation: iter 7028, triggered by a periodic timer firing
    started at 2025-01-21T16:40:15.828Z (38s ago) and ran for 550ms
    attempt to propagate generation: 7

      DNS_SERVER_ADDR            LAST_RESULT
      [fd00:1122:3344:1::1]:5353 success
      [fd00:1122:3344:2::1]:5353 success
      [fd00:1122:3344:3::1]:5353 success

which means all three of the DNS servers should have records to serve.

But from within the switch zone, we get three different results when querying the three DNS servers. fd00:1122:3344:1::1 times out, as expected since only sled 17 seems to be able to reach it:

root@oxz_switch0:~# host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:1::1
;; communications error to fd00:1122:3344:1::1#53: timed out

fd00:1122:3344:2::1 reports no records:

root@oxz_switch0:~# host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:2::1
Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

Host boundary-ntp.control-plane.oxide.internal not found: 2(SERVFAIL)

fd00:1122:3344:3::1 succeeds:

root@oxz_switch0:~# host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:3::1
Using domain server:
Name: fd00:1122:3344:3::1
Address: fd00:1122:3344:3::1#53
Aliases:

boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:103::21

@jgallagher
Copy link
Contributor

Why can only sled 17 get to fd00:1122:3344:2::1? From a switch zone, ddmadm get-prefixes looks suspicious:

root@oxz_switch0:~# ddmadm get-prefixes | sort
Destination              Next Hop                  Path
fd00:1122:3344:101::/64  fe80::aa40:25ff:fe04:351  BRM42220026
fd00:1122:3344:102::/64  fe80::aa40:25ff:fe04:614  BRM27230037
fd00:1122:3344:103::/64  fe80::aa40:25ff:fe04:6d3  BRM23230018
fd00:1122:3344:104::/64  fe80::aa40:25ff:fe04:851  BRM23230010
fd00:1122:3344:1::/64    fe80::aa40:25ff:fe04:351  BRM42220026
fd00:1122:3344:1::/64    fe80::aa40:25ff:fe04:851  BRM23230010
fd00:1122:3344:2::/64    fe80::aa40:25ff:fe04:351  BRM42220026
fd00:1122:3344:2::/64    fe80::aa40:25ff:fe04:614  BRM27230037
fd00:1122:3344:3::/64    fe80::aa40:25ff:fe04:6d3  BRM23230018
fdb0:a840:2504:351::/64  fe80::aa40:25ff:fe04:351  BRM42220026
fdb0:a840:2504:614::/64  fe80::aa40:25ff:fe04:614  BRM27230037
fdb0:a840:2504:6d3::/64  fe80::aa40:25ff:fe04:6d3  BRM23230018
fdb0:a840:2504:851::/64  fe80::aa40:25ff:fe04:851  BRM23230010

Both fd00:1122:3344:1::/64 and fd00:1122:3344:2::/64 show up twice with two different sleds listed as their next hop. This is why I mentioned I think there might be a sled-agent bug here. When starting an internal DNS zone, sled-agent explicitly tells maghemite that it's advertising this new address:

// If this address is in a new ipv6 prefix, notify
// maghemite so it can advertise it to other sleds.
self.advertise_prefix_of_address(*gz_address).await;

but there's no corresponding equivalent to tell maghemite to withdraw that advertisement when shutting down an internal DNS zone, AFAICT. (Off the top of my head I'm not entirely sure what that would look like, since we'd have to be careful not to withdraw a prefix if we ourselves had already started a different zone with that same prefix? I'm not sure whether that's a possible scenario.)

Maybe we're seeing two sleds with that prefix because the sled that was running the original fd00:1122:3344:1::1 DNS zone never withdrew its prefix? That's consistent with the blueprint: BRM23230010 is sled 17 (running the new internal DNS), and BRM42220026 is sled 61b7c861-993a-4590-9ca7-f506c59ee0a0 where the now-expunged internal DNS that had that address was running.

@jgallagher
Copy link
Contributor

Things look even worse when we look at fd00:1122:3344:2::/64. The blueprint says that address was assigned to the now-expunged internal DNS zone on sled 104dc345-936a-4652-97da-374ae06c30e5 (aka BRM27230037 aka sled 15), and should now be assigned to the internal DNS zone on sled 61b7c861-993a-4590-9ca7-f506c59ee0a0 (aka BRM42220026 aka sled 14). But both 14 and 15 have internal DNS zones running at that same underlay address:

root@oxz_switch0:~# pilot host exec -c 'ipadm | grep internaldns' 14 15
14  BRM42220026        ok: underlay0/internaldns0 static ok        fd00:1122:3344:2::2/64
15  BRM27230037        ok: underlay0/internaldns1 static ok        fd00:1122:3344:2::2/64
root@oxz_switch0:~# pilot host exec -c 'zlogin $(zoneadm list | grep internal_dns) ipadm | grep omicron' 14 15
14  BRM42220026        ok: oxControlService20/omicron6 static ok   fd00:1122:3344:2::1/64
15  BRM27230037        ok: oxControlService0/omicron6 static ok    fd00:1122:3344:2::1/64

The DNS server on sled 14 (the new one) is the one returning no records:

BRM42220026 # host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:2::1
Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

Host boundary-ntp.control-plane.oxide.internal not found: 2(SERVFAIL)

and the one that should be expunged on sled 15 has records:

BRM27230037 # host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:2::1
Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:103::21

Presumably all three Nexus zones are only getting to sled 15? That would explain why sled 14 has no records, I think (it was just started and never got a sync from any Nexus). I'm not sure how to confirm that at the maghemite layer, but using "does the DNS server have records" as a proxy it looks like that's right; all three of them get successful responses so presumably are talking to the should-be-expunged internal DNS zone on sled 15:

root@oxz_switch0:~# pilot host exec -c 'zlogin $(zoneadm list | grep nexus) host boundary-ntp.control-plane.oxide.internal fd00:1122:3344:2::1' 15-17
15  BRM27230037        ok: Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:103::21
16  BRM23230018        ok: Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:103::21
17  BRM23230010        ok: Using domain server:
Name: fd00:1122:3344:2::1
Address: fd00:1122:3344:2::1#53
Aliases:

boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:102::10
boundary-ntp.control-plane.oxide.internal has IPv6 address fd00:1122:3344:103::21

@jgallagher
Copy link
Contributor

We met to go over this and believe we understand all of the issues above.

  1. Duplicate DDM prefixes are due to sled agent never withdrawing advertisements ([sled agent] Zone initialization causes maghemite advertisement, but nothing stops this on zone teardown #7377).
  2. The DNS "no record found" errors are due to 1 (duplicate DDM prefixes caused one of the servers to never receive data from Nexus) and bad resolver behavior ([sled-agent] Use qorb for switch zone clients #7381).
  3. PUT /omicron-zones timing out is due to sled-agent holding a lock during zone initialization ([sled-agent] PUT /omicron-zones holds a lock while initializing new zones #7379), which blocked indefinitely due to the DNS failures in 2.
  4. The duplicate running DNS zones at fd00:1122:3344:2::1 are due to expunging a sled without following the procedure to power it off and remove it first. However, this made us realize there are still cases where we could get two internal DNS servers running with the same IP, and reconfigurator should take steps to avoid that ([reconfigurator] Planner should not reassign an internal DNS IP until confirming the prior zone is gone #7380).

Given the set of more specific issues, I'm going to close this one. Thanks again @askfongjojo!

@davepacheco
Copy link
Collaborator

I just wanted to add a little more detail from my recollection. An individual internal DNS zone was expunged from a sled as part of a physical disk expungement, but as John mentioned, the system never withdrew the advertisement (#7377). Then the same internal DNS IP was put onto another sled. In terms of DNS propagation: the Nexus instances trying to propagate DNS to the new zone were either timing out (because the packets were being sent to the wrong place? I don't actually fully understand this because both zones were running) or succeeding incorrectly (because they were reaching the first zone). The result was that the newly-deployed zone wasn't getting the DNS config. This caused queries to fail with the error mentioned above. Some of those queries were happening in the "zone boot" code path, via opte_ports_needed() -- this is where #7381 comes in.

@jgallagher
Copy link
Contributor

the Nexus instances trying to propagate DNS to the new zone were either timing out (because the packets were being sent to the wrong place? I don't actually fully understand this because both zones were running) or succeeding incorrectly (because they were reaching the first zone)

FWIW: they were succeeding incorrectly (i.e., they were successfully propagating records to the zone on the "expunged" sled and not talking to the newly-started zone at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants