Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecoverable NotAuthenticatedException during cluster upgrade #25

Open
megakid opened this issue Nov 25, 2021 · 1 comment
Open

Unrecoverable NotAuthenticatedException during cluster upgrade #25

megakid opened this issue Nov 25, 2021 · 1 comment
Assignees
Labels

Comments

@megakid
Copy link
Contributor

megakid commented Nov 25, 2021

Describe the bug
We cannot reproduce this reliably but when upgrading our 3 node UAT clusters from V5 to V21, we noticed that some of our services - which we expected to reconnect automatically (as with a master failover) - started extreme spamming of logs, high CPU etc

It seems the clientside EventStoreConnection gets into state whereby the connection is marked as not authenticated (although the credentials have not changed during cluster rollout). From this state, the connection object is unrecoverable and needs recreating, we did this by a service restart (everything works after a restart).

We have noticed this behaviour in more than one service and across a couple of our clusters. An educated guess is that 10% of ES clients that we have performed the ES cluster upgrade on have suffered this issue, with the other 90% reconnecting perfectly and continuing to subscribe/read/append to streams.

To Reproduce
Steps to reproduce the behavior:

  1. Service running with persistent subscriptions
  2. Upgrade 3 node cluster to V21 by (as per v5 -> v21 upgrade notice) shutting down all nodes, rolling out v21 nodes + config (keep credentials the same)
  3. See that most of the time, the clients re-establish the connection whilst in the minority of times, they get into a clientside auth state which prevents recovery.

Expected behavior
Clients to reconnect without auth issues

Actual behavior
As above.

Config/Logs/Screenshots
Stack traces are from a few common operations:

EventStore.ClientAPI.Exceptions.NotAuthenticatedException: Not Authenticated
   at async Task<WriteResult> EventStore.ClientAPI.Internal.EventStoreNodeConnection.AppendToStreamAsync(string stream, long expectedVersion, IEnumerable<EventData> events, UserCredentials userCredentials)
EventStore.ClientAPI.Exceptions.NotAuthenticatedException: Not Authenticated
   at async Task<EventStorePersistentSubscriptionBase> EventStore.ClientAPI.EventStorePersistentSubscriptionBase.Start()

EventStore details

  • EventStore server version:
    21.10
  • Operating system:
    Windows
  • EventStore client version (if applicable):
    21.2.0
@megakid
Copy link
Contributor Author

megakid commented Nov 25, 2021

We think this is likely because we haven't set the RetryAuthenticationOnTimeout flag. I do think if DefaultUserCredentials are set, it should not allow the connection state to proceed to ConnectingPhase.Identification unless the ConnectingPhase.Authentication successfully completes.
Not asserting that means that transient errors (e.g. a timeout) that aren't surfaced to user code - except via AuthenticationFailed event - are silently ignored and cause unexpected, unrecoverably behaviour for the lifetime of the EventStore client object. The addition of RetryAuthenticationOnTimeout seems to mitigate one failure modes but, if I understand the current code correctly, if the server responds with NotAuthenticated, it still continues to connect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants