Handle watch errors #15

jsoriano · 2017-06-15T12:41:38Z

If a watch error is received we might have lost events, so reset local storages and reconnect to receive everything.

One of the possible errors is that the version we are requesting doesn't exist anymore, watch is then interrupted and kube2lb reconnects, and keeps reconnecting with the same error till timeout. I have seen these watch errors when reconnecting to a restarted API server in a testing scenario, I have never seen it in a production scenario with multiple API servers.

Seen while investigating #13.

mrodm · 2017-06-15T15:14:41Z

kubernetes.go

@@ -320,13 +330,13 @@ func (c *KubernetesClient) Watch() error {
 			c.eventForwarder(e)
 		}

-		if !more {
+		if !more || e.Type == watch.Error {


If a watch error is received, is the connection with the API still up?
If that is the case, should we disconnect before trying to connect again to the API ? is it managed by connect method ?

What I have seen is that if we receive an error, the channel is closed, so for next iteration more will be false and then we'd reconnect anyway.
We could just wait for the next iteration, but we are reading from 3 channels, if after a reconnection we are requesting an incorrect version, we'll receive an error from the 3 channels and then we could be interating on errors up to three times, depending on what channel is selected on each iteration. The (not so big) problem with that is that then we'll be handling the error from one to three times when we know that we have to reconnect at the end. So I decided to reconnect directly on the first error.

This Watch method is getting quite big and complex, we should probably refactor it, we could for example handle each watcher separatedly.

Ok, as you mentioned, it seems better trying to reconnect ASAP.

If a watch error is received we might have lost events, so reset local storages and reconnect to receive everything. Watch errors have been seen when reconnecting to a restarted API server.

jsoriano requested review from dpaneda and mrodm June 15, 2017 12:41

jsoriano added this to the next milestone Jun 15, 2017

jsoriano force-pushed the handle-watch-error branch from c80dc74 to 65e21b7 Compare June 15, 2017 12:58

mrodm reviewed Jun 15, 2017

View reviewed changes

mrodm approved these changes Jun 16, 2017

View reviewed changes

Handle watch errors

6854982

If a watch error is received we might have lost events, so reset local storages and reconnect to receive everything. Watch errors have been seen when reconnecting to a restarted API server.

jsoriano force-pushed the handle-watch-error branch from 65e21b7 to 6854982 Compare June 26, 2017 12:50

jsoriano merged commit ac6a29f into tuenti:master Jun 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle watch errors #15

Handle watch errors #15

jsoriano commented Jun 15, 2017 •

edited

Loading

mrodm Jun 15, 2017

jsoriano Jun 15, 2017

mrodm Jun 15, 2017

Handle watch errors #15

Handle watch errors #15

Conversation

jsoriano commented Jun 15, 2017 • edited Loading

mrodm Jun 15, 2017

Choose a reason for hiding this comment

jsoriano Jun 15, 2017

Choose a reason for hiding this comment

mrodm Jun 15, 2017

Choose a reason for hiding this comment

jsoriano commented Jun 15, 2017 •

edited

Loading