`IScope` retry strategy potentially insufficient #12212

mattbrailsford · 2022-04-03T14:23:28Z

mattbrailsford
Apr 3, 2022
Collaborator

I've been doing some research recently to see what would be the best persistence layer for Vendr moving forward and as part of that research I looked into EF Core. In implementing this into my codebase, I implemented it around the UoW pattern we currently use (which previously wrapped Umbracos IScope interface) and hit upon an interesting problem when enabling EF Core's retry policy feature whilst using EF Core with your own transaction, it throws an error.

InvalidOperationException: The configured execution strategy 'SqlServerRetryingExecutionStrategy' does not support user initiated transactions. Use the execution strategy returned by 'DbContext.Database.CreateExecutionStrategy()' to execute all the operations in the transaction as a retriable unit.

I raised an issue with them to ask why couldn't it work similar to how Umbraco implements retries which is ultimately to retry every command that is issued rather than the transaction as a whole which got an interesting anwser dotnet/efcore#27716

While this is possible, many transient exceptions (which are precisely the errors you want to retry) also happen to terminate the connection (e.g. network issue); in this case, we cannot retry at all. The current situation is better in that it informs users up-front about this compatibility, and points them towards the right pattern for dealing with any kind of transient exception (i.e. manual execution of the retry strategy with a block).

So allowing user-initiated transactions with retrying strategies would allow retrying for a certain class of transient exceptions (e.g. deadlocks), but it would actually make the situation worse for exceptions which break the connections, since users would no longer be aware of the problem and be guided to manual execution.

Which then ultimately got me thinking that if this is the case in EF Core, isn't this going to be the case with NPoco/Umbraco too where a custom transaction is used? And I do believe it actually is. I asked Shay on twitter to clarify my understanding which you can see here https://twitter.com/mattbrailsford/status/1510527433518329857

Ultimately, because Umbraco (and I) both use a UoW pattern with an overarching transaction for the whole unit, the retry strategy should actually be implemented at the transaction level and not the individual command level. Many transient errors will also result in the loss of a connection which, on the DB server should ultimately rollback your transaction, but from the code perspective then, the retry needs to occur from the beginning of the entire transaction.

I can't fully work out yet whether Umbraco just deals with issues that can be retried on the current connection, or whether it is incorrectly trying for all transient error types (even ones with loss of connection) but if it is the later, then it could end up putting the database into an inconsistent state due to continuing from a point where previous commands have now been rolled back.

As an answer to this, I'm wondering if the scope provider API should change from issuing disposable scopes

using (var scope = _scopeProvider.Create())
{ 
    // Do some work with scope
    scope.Complete();
}

to something more like

_sopeProvider.Execute(scope => 
{
    // Do some work with scope
   scope.Complete();
});

By providing the work to execute in the scope context as an action, it allows you to retry the entire work unit if there is a transient failure rather than just an individual command.

The only thing I'm unsure of with this approach is what it means for events as by running the whole work unit again, any "ing" style events will be raised again. This should mostly be OK, but if developers are using these events to modify other entities, then they would need to be aware that events might fire multiple times if there are transient errors.

Is my understanding of how Umbraco handles transient errors correct? And if so, does this mean the above is true and is something that needs to be investigated further?

CarlSargunar · 2022-04-04T09:55:50Z

CarlSargunar
Apr 4, 2022

Interesting idea Matt - I've been reading about (never used in production) the circuit breaker pattern around a retry library like Polly. Your point about things "publishing" or "deleting" events being raised again is one that requires some re-work, potently moving code to be triggered on completed events.

Will be interested in seeing where this goes!

1 reply

mattbrailsford Apr 4, 2022
Collaborator Author

I don't really think "ing" events are able to be moved as these are often cancellable events or allow modification of an entity before the operation is performed (think hooking into the Saving event of an entity to set the default values of a prop or something). You do want these to be fired at the exact time. "ed" events are safe to run after the transaction is completed and I do believe this is how they already work (the IScope tracks these and fires them once the scope is completed).

The biggest risk is if in Core or a developers implementation they do something in those events to some external resource that is changeable (for example, reducing stock). In those occasions it could result in things getting out of sync. But really, I don't think those types of things should be in "ing" style events anyway.

I do think something could be done with the IScope though to know which attempt this is if it's a retry, but how useful that is, I dunno. I guess any logic that ran that needed to only run once would need to run and track what attempt number they were on when they ran and if the attempt number is greater don't run again. But, I dunno. This might be overkill

mattbrailsford · 2022-04-04T10:16:31Z

mattbrailsford
Apr 4, 2022
Collaborator Author

Some interesting additional reading coming from Microsoft on how to handle transient errors found here https://docs.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-connectivity-issues and according to their recommendations, they do suggest every retry should be against a fresh connection and so if a command fails, you shouldn't just retry the command again on the same connection, but should instead re-connect and run the command a fresh.

This would indeed suggest you would need to retry at the scope/unit level as closing the connection will kill the scopes transaction and rollback any commands that were performed successfully within it.

0 replies

dampee · 2022-04-05T14:33:22Z

dampee
Apr 5, 2022

My comment is not related to the current Umbraco IScope implementation. Only adding my experience with NPoco + EF + Transactions.

I once had troubles with different repo's combined with EntityFramework, the solution then was to use Sytem.Transactions.Transaction instead of Database.BeginTransaction (and to add the current transaction to my current scope). That way I could have different frameworks (EF core + Npoco) within a single scope. If I remember correctly, I even used nested transactions too that way.

When my repo retrieved the connection it had an additional check whether or not it had to hook into the current transaction (see schotime/NPoco#49)

// make sure the current transaction is used
if (reuseCurrentTransction == true)
{
  ((DbConnection)database.Connection).EnlistTransaction(System.Transactions.Transaction.Current);
}

So depending on the implementation of IScope, it might cause issues. :-)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IScope` retry strategy potentially insufficient #12212

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

IScope retry strategy potentially insufficient #12212

mattbrailsford Apr 3, 2022 Collaborator

Replies: 3 comments · 1 reply

CarlSargunar Apr 4, 2022

mattbrailsford Apr 4, 2022 Collaborator Author

mattbrailsford Apr 4, 2022 Collaborator Author

dampee Apr 5, 2022

`IScope` retry strategy potentially insufficient #12212

mattbrailsford
Apr 3, 2022
Collaborator

Replies: 3 comments 1 reply

CarlSargunar
Apr 4, 2022

mattbrailsford Apr 4, 2022
Collaborator Author

mattbrailsford
Apr 4, 2022
Collaborator Author

dampee
Apr 5, 2022