Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle watch errors #15

Merged
merged 1 commit into from
Jun 26, 2017
Merged

Handle watch errors #15

merged 1 commit into from
Jun 26, 2017

Conversation

jsoriano
Copy link
Contributor

@jsoriano jsoriano commented Jun 15, 2017

If a watch error is received we might have lost events, so reset local storages and reconnect to receive everything.

One of the possible errors is that the version we are requesting doesn't exist anymore, watch is then interrupted and kube2lb reconnects, and keeps reconnecting with the same error till timeout. I have seen these watch errors when reconnecting to a restarted API server in a testing scenario, I have never seen it in a production scenario with multiple API servers.

Seen while investigating #13.

@jsoriano jsoriano requested review from dpaneda and mrodm June 15, 2017 12:41
@jsoriano jsoriano added this to the next milestone Jun 15, 2017
@jsoriano jsoriano force-pushed the handle-watch-error branch from c80dc74 to 65e21b7 Compare June 15, 2017 12:58
kubernetes.go Outdated
@@ -320,13 +330,13 @@ func (c *KubernetesClient) Watch() error {
c.eventForwarder(e)
}

if !more {
if !more || e.Type == watch.Error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a watch error is received, is the connection with the API still up?
If that is the case, should we disconnect before trying to connect again to the API ? is it managed by connect method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have seen is that if we receive an error, the channel is closed, so for next iteration more will be false and then we'd reconnect anyway.
We could just wait for the next iteration, but we are reading from 3 channels, if after a reconnection we are requesting an incorrect version, we'll receive an error from the 3 channels and then we could be interating on errors up to three times, depending on what channel is selected on each iteration. The (not so big) problem with that is that then we'll be handling the error from one to three times when we know that we have to reconnect at the end. So I decided to reconnect directly on the first error.

This Watch method is getting quite big and complex, we should probably refactor it, we could for example handle each watcher separatedly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as you mentioned, it seems better trying to reconnect ASAP.

If a watch error is received we might have lost events,
so reset local storages and reconnect to receive
everything.

Watch errors have been seen when reconnecting to a
restarted API server.
@jsoriano jsoriano force-pushed the handle-watch-error branch from 65e21b7 to 6854982 Compare June 26, 2017 12:50
@jsoriano jsoriano merged commit ac6a29f into tuenti:master Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants