Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement non-graceful shutdown #2702

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

implement non-graceful shutdown #2702

wants to merge 4 commits into from

Conversation

YZ775
Copy link
Contributor

@YZ775 YZ775 commented Oct 24, 2024

@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 3 times, most recently from 5b4e8b8 to eb0c558 Compare October 28, 2024 00:56
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 3 times, most recently from b173db3 to 3768397 Compare November 7, 2024 04:52
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from 69b7b44 to 800de3e Compare November 12, 2024 05:06
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 2 times, most recently from 806283a to bc98450 Compare November 26, 2024 01:51
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from 507deec to e405bd0 Compare December 23, 2024 04:08
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 5 times, most recently from 7b740ab to 151d645 Compare January 16, 2025 01:53
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 7 times, most recently from a5ee9a1 to 0cd28be Compare February 3, 2025 02:17
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 3 times, most recently from f0e57db to 42b10c7 Compare February 6, 2025 08:15
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 2 times, most recently from b30cabb to 801f1dc Compare February 13, 2025 07:32
@YZ775 YZ775 marked this pull request as ready for review February 13, 2025 07:48
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from 801f1dc to 93c94be Compare February 13, 2025 07:56
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from 93c94be to 8c1c789 Compare February 13, 2025 08:14
@YZ775 YZ775 requested a review from masa213f February 14, 2025 04:43
@@ -42,11 +42,15 @@ repair:
command_retries: 2
command_interval: 30
watch_seconds: 600
- repair_command: ["sh", "-c", "neco non-graceful-node-shutdown shutdown $1", "non-graceful-node-shutdown"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we need not use sh -c.
neco non-graceful-node-shutdown shutdon/cleanup command will work just add the IP Address to the end of the command.

Suggested change
- repair_command: ["sh", "-c", "neco non-graceful-node-shutdown shutdown $1", "non-graceful-node-shutdown"]
- repair_command: ["neco", "non-graceful-node-shutdown", "shutdown"]

Comment on lines 28 to 29
Short: "non-Graceful Node Shutdown related commands",
Long: `non-Graceful Node Shutdown related commands.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use capital letters?

Suggested change
Short: "non-Graceful Node Shutdown related commands",
Long: `non-Graceful Node Shutdown related commands.`,
Short: "Non-Graceful Node Shutdown related commands",
Long: `Non-Graceful Node Shutdown related commands.`,

Comment on lines 35 to 38
powerCheckCmd := exec.Command("neco", "power", "status", node)
var out bytes.Buffer
powerCheckCmd.Stdout = &out
err = powerCheckCmd.Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, Cmd.Output() is enough.

Suggested change
powerCheckCmd := exec.Command("neco", "power", "status", node)
var out bytes.Buffer
powerCheckCmd.Stdout = &out
err = powerCheckCmd.Run()
out, err := exec.Command("neco", "power", "status", node).Output()

https://pkg.go.dev/os/exec#Cmd.Output


type CephCluster struct {
Name string
NameSpace string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of Kubernetes, Namespace is a proper noun.
https://kubernetes.io/docs/reference/glossary/?fundamental=true#term-namespace

Suggested change
NameSpace string
Namespace string

if err != nil {
return err
}
sabakanStatus := machines[0].Status.State
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident, so please tell me.
Is it necessary to check the sabakan status before unfencing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sabakanStatus is used in L90.
In the end of cleanup, we need to untaint the node if repair of the machine is succeeded and become healthy.

Comment on lines 43 to 44
poweroffCmd := exec.Command("neco", "power", "stop", node)
err = poweroffCmd.Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's better to run command directly. In this case, it's enough.

Suggested change
poweroffCmd := exec.Command("neco", "power", "stop", node)
err = poweroffCmd.Run()
err := exec.Command("neco", "power", "stop", node).Run()

Comment on lines 91 to 92
kubernetesNode := &corev1.Node{}
err = kubeClient.Get(ctx, client.ObjectKey{Name: node}, kubernetesNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be an error with the spare machines, won't it?

}
err = kubeClient.Create(ctx, &networkFence)
if err != nil {
if client.IgnoreAlreadyExists(err) == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, in this case IsAlreadyExists is readable.

Suggested change
if client.IgnoreAlreadyExists(err) == nil {
if apierrors.IsAlreadyExists(err) {

return err
}

// Add taint to the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a command is executed for spare machines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgotten to implement the case if the target is spare machine.
There are no need to do non-graceful shutdown against non-k8s node, so I'll fix the command to skip procedures.

Long: `non-Graceful Node Shutdown related commands.`,
}

type CephCluster struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. But, IMO, NamespacedName is enough.
https://pkg.go.dev/k8s.io/apimachinery/pkg/types#NamespacedName

@YZ775
Copy link
Contributor Author

YZ775 commented Feb 17, 2025

I'm checking that the fix works fine in CI now.
https://github.com/cybozu-private/neco-apps/actions/runs/13364264101

@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 2 times, most recently from ef648bc to 0069a33 Compare February 17, 2025 07:36
@YZ775 YZ775 requested a review from masa213f February 17, 2025 07:57
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch 3 times, most recently from 2eb352d to e5cc3f6 Compare February 20, 2025 07:01
@YZ775
Copy link
Contributor Author

YZ775 commented Feb 20, 2025

@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from e5cc3f6 to 7b00171 Compare February 20, 2025 07:29
@YZ775 YZ775 force-pushed the non-graceful-shutdown branch from 7b00171 to d80e821 Compare February 21, 2025 01:52
@YZ775
Copy link
Contributor Author

YZ775 commented Feb 25, 2025

CI was passed.
https://github.com/cybozu-private/neco-apps/actions/runs/13513549671/job/37758134998
@masa213f
This PR is ready for review. Sorry for late.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants