Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a drain timeout #351

Closed
wants to merge 1 commit into from
Closed

add a drain timeout #351

wants to merge 1 commit into from

Conversation

flbla
Copy link

@flbla flbla commented Apr 12, 2021

issue : #78
rebase on main branch instead of master (previous PR : #283)

@flbla
Copy link
Author

flbla commented Apr 12, 2021

@evrardjp @dholbach

@evrardjp
Copy link
Collaborator

evrardjp commented Apr 12, 2021

Awesome! Thanks @flbla . Can you fix the go test?

Copy link
Collaborator

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes might be required, and this might conflict with an existing PR. It might be worth merging the two approaches.

}
if err := kubectldrain.RunCordonOrUncordon(drainer, node, true); err != nil {
log.Fatalf("Error cordonning %s: %v", nodename, err)
}

if err := kubectldrain.RunNodeDrain(drainer, nodename); err != nil {
log.Fatalf("Error draining %s: %v", nodename, err)
log.Error("Error draining %s: %v", nodename, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be Errorf

if isDrained {
invokeReboot(nodeID, rebootCommand)
for {
log.Infof("Waiting for reboot")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be log.Info

} else {
log.Infof("Uncordon %s", node.GetName())
uncordon(client, node)
deleteFlag := newCommand("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/rm", rebootSentinelFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to use the available functions for wrapping things properly (so that it works on all OSes).

Next, what about the cases where the sentinel is a command instead of a sentinel file?

}
if err := kubectldrain.RunCordonOrUncordon(drainer, node, true); err != nil {
log.Fatalf("Error cordonning %s: %v", nodename, err)
}

if err := kubectldrain.RunNodeDrain(drainer, nodename); err != nil {
log.Fatalf("Error draining %s: %v", nodename, err)
log.Error("Error draining %s: %v", nodename, err)
return false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another PR that grew a bit, and is touching this, maybe it's worth sharing efforts?

Can you have a look at https://github.com/weaveworks/kured/pull/341/files ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe missed something, but I think it doens't uncordon the nodes after the timeout in the #341 ?

Copy link
Collaborator

@evrardjp evrardjp Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Here, I think error handling can be indeed improved. We can change the function signature to return the error and maybe the result.

If an error happens, log it. If forceReboot isn't true, we should probably uncordon and continue the main loop. Else, the control flow continues. That makes it far clearer to read, IMO.

@fouadsemaan
Copy link

The docs claim this change was put in as part of release 1.7.0. Is there an ETA?

@evrardjp
Copy link
Collaborator

@fouadsemaan TBH, I am not sure this PR is necessary anymore. It gives extra details compared to a merged feature.
I would love to hear @flbla and your opinion on this.

Because the other feature was merged already, you might be interested by it @fouadsemaan .

@flbla
Copy link
Author

flbla commented Jul 28, 2021

Hi,
I'm not sure if a drain timeout happened if it actually uncordon nodes
If not, this PR is still necessary

@github-actions
Copy link

This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants