You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@davissp14 to help us get a sense of priority on this. How is this metric used? Are there frequent cases where having this metric would have prevented users to hit walls?
I'll +1 this with a use-case: we alarm on high replication lag to give us an early warning if a replica is falling behind. This metric is important because a laggy replica often appears totally fine on every other technical point (normal memory, cpu, etc.), but totally wrong to users (why won't my data save? why can't I access new values? etc.) A few months after this metric stopped working we had a replica fall way behind and cause mass user pain in a region. We didn't catch the issue until we noticed a massive customer contact spike (not an ideal approach to error detection).
No description provided.
The text was updated successfully, but these errors were encountered: