Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Add exponential backoff when reading from Redis #3344

Open
wants to merge 2 commits into
base: 8.3
Choose a base branch
from

Conversation

kdambekalns
Copy link
Member

When Redis is not ready (yet) when trying to read from the cache, this makes Flow wait and retry up to 8 times with an exponentially growing back-off time.

Fixes #3284

Review instructions

This might be hard to check in real life, sorry…

Checklist

  • Code follows the PSR-2 coding style
  • Tests have been created, run and adjusted as needed
  • The PR is created against the lowest maintained branch
  • Reviewer - PR Title is brief but complete and starts with FEATURE|TASK|BUGFIX
  • Reviewer - The first section explains the change briefly for change-logs
  • Reviewer - Breaking Changes are marked with !!! and have upgrade-instructions

do {
try {
return $this->uncompress($this->redis->get($this->getPrefixedIdentifier('entry:' . $entryIdentifier)));
} catch (\RedisException $exception) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems dangerous to me.. Isn't there at least a more specific exception that is thrown when "Redis is not ready (yet)"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you can only inspect the error message.

But then again, even if it's something else, why not try again? What could be dangerous?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but I've seen so many errors in the past due to silently caught exceptions that those make me suspicious.
Also we have seen some performance issues with exponential back offs in the past (and are replacing those in Neos 9). For example: Some error in Redis will now lead to a 20s+ delay (if I counted correctly) until it is displayed. A wrong configuration could easily build up to kill the server upon deployments.

What about moving this logic to the getStatus() implementation and make sure that it is called upon deployment?
And/or implement the WithSetupInterface and put it there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe logging the exceptions would be good – after all, at that point nothing should go wrong.

About killing something on a deployment: The use case for me here is exactly a deployment failure. 🙃

Adding something else to explicitly call won't happen (in "my" current project), as we have that workaround already: Call doctrine:compileproxies after deployment until no error appears… 🙈

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About killing something on a deployment: The use case for me here is exactly a deployment failure.

Right, it fixes the error you described in the issue, but it might create a new one for misconfigured backends.
I won't block this, but I'm not a big fan as you might be able to tell :)
What about turning this into a composition, i.e. introduce some RetryBackend that can be wrapped around any other backend (similar to the MultiBackend)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an option…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the PDO and Redis backend should both have a automatic retry implementation like this by default. In modern hosting environments, there will be movement and it can happen that a Redis is gone for a second or two once in a while. Therefore, it would be nice if a Flow dev wouldn't have to think about this and her application would just work.

I see the point of @bwaidelich that sometimes retries can worsen things. You need to apply the "circuit braker" design pattern for those cases. However, it depends …

I wouldn't like to introduce another wrapper backend since it makes the setup complicated and I think that a reasonable retry is what most people want.

How about making this configurable (on / off and number of retries) using the same option names for RedisBackend, PDOBackend and any other backend which could profit from this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants