Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry with ForcePowerOff if graceful shutdown times out #129

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

defo89
Copy link
Contributor

@defo89 defo89 commented Sep 17, 2024

Proposed Changes

  • I observed that in case Server is stuck in a failed PXE Boot or in other non-responsive state, redfish.GracefulShutdownResetType would not power off the system.
  • So we can retry with redfish.PushPowerButtonResetType if above fails.
  • Enabled via flag --enforce-power-off

@defo89 defo89 requested a review from afritzler September 17, 2024 12:47
@defo89 defo89 requested a review from hardikdr September 17, 2024 12:52
@defo89 defo89 marked this pull request as draft September 17, 2024 12:56
@defo89
Copy link
Contributor Author

defo89 commented Sep 17, 2024

hmm, need to revisit. For Dell it has worked, but on Lenovo I get reset type 'PushPowerButton' is not supported by this service

2024-09-17T12:56:20.219583196Z 2024-09-17T12:56:20Z	ERROR	Reconciler error	{"controller": "server", "controllerGroup": "metal.ironcore.dev", "controllerKind": "Server", "Server": {"name":"node006-bb00-system-0"}, "namespace": "", "name": "node006-bb00-system-0", "reconcileID": "1e290486-c86f-4df4-90a5-5c55ab438ae1", "error": "failed to ensure server state transition: failed to ensure server power state: failed to power off server: failed to reset system to power on state: reset type 'PushPowerButton' is not supported by this service"}

@defo89 defo89 force-pushed the retry-with-power-off branch from 781c969 to 12d4346 Compare September 17, 2024 13:24
@defo89 defo89 changed the title Retry with Power off if graceful shutdown times out Retry with ForcePowerOff if graceful shutdown times out Sep 17, 2024
@defo89 defo89 marked this pull request as ready for review September 17, 2024 13:29
I observed that in case `Server` is stuck in a failed PXE Boot or in other non-responsive state, `redfish.GracefulShutdownResetType` would not power off the system. So we can retry with `redfish.ForcePowerOff` if above fails.
@defo89 defo89 force-pushed the retry-with-power-off branch from 12d4346 to 25434a0 Compare September 17, 2024 13:39
@afritzler afritzler added the enhancement New feature or request label Sep 17, 2024
@defo89 defo89 requested a review from stefanhipfel September 17, 2024 14:33
Copy link
Member

@afritzler afritzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! Maybe @stefanhipfel can also have a lot at this PR.

@stefanhipfel
Copy link
Contributor

lgtm as well.
we could also use metal.ironcore.dev/operation annotation to force a power off. But maybe it is good that this is happening automatically.

@defo89
Copy link
Contributor Author

defo89 commented Sep 18, 2024

lgtm as well. we could also use metal.ironcore.dev/operation annotation to force a power off. But maybe it is good that this is happening automatically.

good point. I'd see metal.ironcore.dev/operation annotation as a troubleshooting tool to perform the operation and also ability to force power off for users that will not use this --enforce-power-off option.

@defo89 defo89 merged commit d648ac8 into main Sep 18, 2024
10 checks passed
@defo89 defo89 deleted the retry-with-power-off branch September 18, 2024 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants