You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Events occurring while the target esdb connection is down are not synced once the target comes back online. I have noticed that the events are synced if the replicator is restarted.
Steps to reproduce the behavior:
Setup 3 containers: A(esdb), Central(esdb), and Replicator with A as source and Central as target. Make these connections grpc.
Shutdown Central
Add event to A
Reconnect Central
Check if the event has been written to Central (it won't be)
Restart the Replicator container
Check if the event now exists in Central (it will be)
Expected behavior
Once the target container is restarted, the events will be synced between A and Central
Actual behavior
The sink pipe is stuck in an infinite loop because it has events it has not been able to write.
Config/Logs/Screenshots
{"@t":"2022-03-08T00:30:03.0674506Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:04.0799980Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:05.0786852Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:06.0794062Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:07.0757618Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
I believe we found the root cause, and there are options to fix this, both complex and relatively easy.
It seems like the whole thing dies silently inside the Shovel and never comes back alive.
To me (to be validated) the simplest fix would be to add a health check for both ESDB instances (source and target) on the outside, and it will make the service unhealthy based on the cluster status. The orchestrator then will restart the service.
There's a risk there, quite obvious, which is when the ESDB cluster comes back online and if it does it relatively quickly, the health check will be successful and the service won't restart. So, it will be required to configure rather strict restart policy.
A more complex solution is to actually handle the sink issue properly (catch an exception) and add a health check on the inside when the sink stops writing.
Events occurring while the target esdb connection is down are not synced once the target comes back online. I have noticed that the events are synced if the replicator is restarted.
Steps to reproduce the behavior:
Expected behavior
Once the target container is restarted, the events will be synced between A and Central
Actual behavior
The sink pipe is stuck in an infinite loop because it has events it has not been able to write.
Config/Logs/Screenshots
{"@t":"2022-03-08T00:30:03.0674506Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:04.0799980Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:05.0786852Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:06.0794062Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
{"@t":"2022-03-08T00:30:07.0757618Z","@m":"Waiting for the sink pipe to exhaust (2 left)...","@i":"06fcac33","Left":2,"SourceContext":"EventStore.Replicator.Replicator"}
EventStore details
This is a really great product btw!
DEV-76
The text was updated successfully, but these errors were encountered: