-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement non-graceful shutdown #2702
base: main
Are you sure you want to change the base?
Conversation
5b4e8b8
to
eb0c558
Compare
b173db3
to
3768397
Compare
69b7b44
to
800de3e
Compare
806283a
to
bc98450
Compare
507deec
to
e405bd0
Compare
7b740ab
to
151d645
Compare
a5ee9a1
to
0cd28be
Compare
f0e57db
to
42b10c7
Compare
b30cabb
to
801f1dc
Compare
801f1dc
to
93c94be
Compare
93c94be
to
8c1c789
Compare
etc/cke-template.yml
Outdated
@@ -42,11 +42,15 @@ repair: | |||
command_retries: 2 | |||
command_interval: 30 | |||
watch_seconds: 600 | |||
- repair_command: ["sh", "-c", "neco non-graceful-node-shutdown shutdown $1", "non-graceful-node-shutdown"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we need not use sh -c
.
neco non-graceful-node-shutdown shutdon/cleanup
command will work just add the IP Address to the end of the command.
- repair_command: ["sh", "-c", "neco non-graceful-node-shutdown shutdown $1", "non-graceful-node-shutdown"] | |
- repair_command: ["neco", "non-graceful-node-shutdown", "shutdown"] |
Short: "non-Graceful Node Shutdown related commands", | ||
Long: `non-Graceful Node Shutdown related commands.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use capital letters?
Short: "non-Graceful Node Shutdown related commands", | |
Long: `non-Graceful Node Shutdown related commands.`, | |
Short: "Non-Graceful Node Shutdown related commands", | |
Long: `Non-Graceful Node Shutdown related commands.`, |
powerCheckCmd := exec.Command("neco", "power", "status", node) | ||
var out bytes.Buffer | ||
powerCheckCmd.Stdout = &out | ||
err = powerCheckCmd.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, Cmd.Output()
is enough.
powerCheckCmd := exec.Command("neco", "power", "status", node) | |
var out bytes.Buffer | |
powerCheckCmd.Stdout = &out | |
err = powerCheckCmd.Run() | |
out, err := exec.Command("neco", "power", "status", node).Output() |
|
||
type CephCluster struct { | ||
Name string | ||
NameSpace string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of Kubernetes, Namespace
is a proper noun.
https://kubernetes.io/docs/reference/glossary/?fundamental=true#term-namespace
NameSpace string | |
Namespace string |
if err != nil { | ||
return err | ||
} | ||
sabakanStatus := machines[0].Status.State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident, so please tell me.
Is it necessary to check the sabakan status before unfencing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sabakanStatus
is used in L90.
In the end of cleanup, we need to untaint the node if repair of the machine is succeeded and become healthy.
poweroffCmd := exec.Command("neco", "power", "stop", node) | ||
err = poweroffCmd.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it's better to run command directly. In this case, it's enough.
poweroffCmd := exec.Command("neco", "power", "stop", node) | |
err = poweroffCmd.Run() | |
err := exec.Command("neco", "power", "stop", node).Run() |
kubernetesNode := &corev1.Node{} | ||
err = kubeClient.Get(ctx, client.ObjectKey{Name: node}, kubernetesNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be an error with the spare machines, won't it?
} | ||
err = kubeClient.Create(ctx, &networkFence) | ||
if err != nil { | ||
if client.IgnoreAlreadyExists(err) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, in this case IsAlreadyExists is readable.
if client.IgnoreAlreadyExists(err) == nil { | |
if apierrors.IsAlreadyExists(err) { |
return err | ||
} | ||
|
||
// Add taint to the node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a command is executed for spare machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgotten to implement the case if the target is spare machine.
There are no need to do non-graceful shutdown against non-k8s node, so I'll fix the command to skip procedures.
Long: `non-Graceful Node Shutdown related commands.`, | ||
} | ||
|
||
type CephCluster struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. But, IMO, NamespacedName
is enough.
https://pkg.go.dev/k8s.io/apimachinery/pkg/types#NamespacedName
I'm checking that the fix works fine in CI now. |
ef648bc
to
0069a33
Compare
2eb352d
to
e5cc3f6
Compare
e5cc3f6
to
7b00171
Compare
7b00171
to
d80e821
Compare
CI was passed. |
test is implemented in neco-apps
https://github.com/cybozu-private/neco-apps/pull/5795
https://github.com/cybozu-private/neco-apps/actions/workflows/dctest-with-neco-feature-branch.yaml