IScope
retry strategy potentially insufficient
#12212
Replies: 3 comments 1 reply
-
Interesting idea Matt - I've been reading about (never used in production) the circuit breaker pattern around a retry library like Polly. Your point about things "publishing" or "deleting" events being raised again is one that requires some re-work, potently moving code to be triggered on completed events. Will be interested in seeing where this goes! |
Beta Was this translation helpful? Give feedback.
-
Some interesting additional reading coming from Microsoft on how to handle transient errors found here https://docs.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-connectivity-issues and according to their recommendations, they do suggest every retry should be against a fresh connection and so if a command fails, you shouldn't just retry the command again on the same connection, but should instead re-connect and run the command a fresh. This would indeed suggest you would need to retry at the scope/unit level as closing the connection will kill the scopes transaction and rollback any commands that were performed successfully within it. |
Beta Was this translation helpful? Give feedback.
-
My comment is not related to the current Umbraco IScope implementation. Only adding my experience with NPoco + EF + Transactions. I once had troubles with different repo's combined with EntityFramework, the solution then was to use Sytem.Transactions.Transaction instead of Database.BeginTransaction (and to add the current transaction to my current scope). That way I could have different frameworks (EF core + Npoco) within a single scope. If I remember correctly, I even used nested transactions too that way. When my repo retrieved the connection it had an additional check whether or not it had to hook into the current transaction (see schotime/NPoco#49)
So depending on the implementation of |
Beta Was this translation helpful? Give feedback.
-
I've been doing some research recently to see what would be the best persistence layer for Vendr moving forward and as part of that research I looked into EF Core. In implementing this into my codebase, I implemented it around the UoW pattern we currently use (which previously wrapped Umbracos
IScope
interface) and hit upon an interesting problem when enabling EF Core's retry policy feature whilst using EF Core with your own transaction, it throws an error.I raised an issue with them to ask why couldn't it work similar to how Umbraco implements retries which is ultimately to retry every command that is issued rather than the transaction as a whole which got an interesting anwser dotnet/efcore#27716
Which then ultimately got me thinking that if this is the case in EF Core, isn't this going to be the case with NPoco/Umbraco too where a custom transaction is used? And I do believe it actually is. I asked Shay on twitter to clarify my understanding which you can see here https://twitter.com/mattbrailsford/status/1510527433518329857
Ultimately, because Umbraco (and I) both use a UoW pattern with an overarching transaction for the whole unit, the retry strategy should actually be implemented at the transaction level and not the individual command level. Many transient errors will also result in the loss of a connection which, on the DB server should ultimately rollback your transaction, but from the code perspective then, the retry needs to occur from the beginning of the entire transaction.
I can't fully work out yet whether Umbraco just deals with issues that can be retried on the current connection, or whether it is incorrectly trying for all transient error types (even ones with loss of connection) but if it is the later, then it could end up putting the database into an inconsistent state due to continuing from a point where previous commands have now been rolled back.
As an answer to this, I'm wondering if the scope provider API should change from issuing disposable scopes
to something more like
By providing the work to execute in the scope context as an action, it allows you to retry the entire work unit if there is a transient failure rather than just an individual command.
The only thing I'm unsure of with this approach is what it means for events as by running the whole work unit again, any "ing" style events will be raised again. This should mostly be OK, but if developers are using these events to modify other entities, then they would need to be aware that events might fire multiple times if there are transient errors.
Is my understanding of how Umbraco handles transient errors correct? And if so, does this mean the above is true and is something that needs to be investigated further?
Beta Was this translation helpful? Give feedback.
All reactions