Scale faucet #278

mutantcornholio · 2023-05-26T16:35:48Z

Currently failed deployments lead to outages.
Let's have two instances on each, so failed deployments would lead to stuck deploys, not downtimes.

mordamax · 2023-05-26T16:55:34Z

@mutantcornholio could you link to pipelines or some logs or provide log examples?

is it when you deploy, and the app can't start, it is: 1. failing CI job and 2. still ends up with deployed app with broken code?

mutantcornholio · 2023-05-26T18:24:06Z

https://grafana.parity-mgmt.parity.io/explore?orgId=1&left=%7B%22datasource%22:%22P03E52D76DFE188C3%22,%22queries%22:%5B%7B%22expr%22:%22%7Bpod%3D~%5C%22.%2Afaucet-wococo.%2A%5C%22%7D%22,%22refId%22:%22A%22,%22editorMode%22:%22code%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P03E52D76DFE188C3%22%7D%7D%5D,%22range%22:%7B%22from%22:%221685094477502%22,%22to%22:%221685126185213%22%7D%7D
Here it stopped working on today's deployment

PierreBesson · 2023-05-29T08:59:16Z

The problem is that if you "scale" the faucet to 2 instances then there will be two processes listening to messages on Matrix and so drips will be produced twice.

mutantcornholio · 2023-05-30T07:32:34Z

The problem is that if you "scale" the faucet to 2 instances then there will be two processes listening to messages on Matrix and so drips will be produced twice.

Yes, that obviously needs to be dealt with.

mutantcornholio · 2023-05-30T08:05:28Z

Probably more inclined to make it the same way as gitspiegel works: upon receiving a message, write its id to database, if another instance has beaten us to it, ignore.

@paritytech/opstooling WDYT?

chevdor · 2023-05-30T09:40:52Z

For this usecase, it sounds much more appropriate to use a queue such as RabbitMQ (just to throw a name).

You'd need one (or more) listener that adds the "job" to the queue. If using several listener, you wanna make sure the use of a key allows preventing dups.

With that it becomes much easier to have as many "worker" as you wish (ie k8s deploys) that will pick up the tasks and remove them from the queue once successfully done. If they fail, the entry remains in the queue and can be picked up by the next worker.

mutantcornholio · 2023-05-30T10:14:50Z

The cost of splitting an instance the into master / worker wouldn't be worth it IMO.

The goal is to have minimum redundancy to allow maintenance without downtime.
Actual load here is laughable, and unlikely to require horizontal scaling in foreseeable future.

If we go with splitting the instance, we will end up with four instances for every network, while we would do perfectly fine with two.
Requiring two instances for the local delopment also is a downside.

We could still go with a job queue, but have both producer and consumer in the same instance. That would be basically the same as what I suggested, except now we'd get free stuff like retries, timeouts, etc.

mordamax · 2023-05-30T11:03:45Z

I'm a bit confused, why wasn't it a problem before?
afair when we used to deploy through helm, it was trying to make a new instance, and if it has started ok, it was replacing one (old) instance with another (new). If I try to deploy broken code or configs, it will just fail on CI level, and prod wouldn't be affected
Is it working differently via ArgoCD?

if yes, are there different ways to solve it rather than having 2 instances and introducing DB etc... ?

that sounds like overkill to me for the problem of wrong configuration or something

mutantcornholio · 2023-05-30T12:38:10Z

afair when we used to deploy through helm, it was trying to make a new instance, and if it has started ok, it was replacing one (old) instance with another (new). If I try to deploy broken code or configs, it will just fail on CI level, and prod wouldn't be affected
Is it working differently via ArgoCD?
if yes, are there different ways to solve it rather than having 2 instances and introducing DB etc... ?

Feels like it always worked like that and nobody cared.

I don't think that any deployment configuration can get around the problem of two instances listening to same matrix events, and duplicating drips as the result.
"Replacing" the instance implies changing the backend for load balancer. If instances get their load by themselves, it won't work.

chevdor · 2023-05-30T13:06:28Z

Feels like it always worked like that and nobody cared.

I think we have been lucky so far.
The helm chart uses deployments and allow an arbitrarty number of replicas using rolling updates.
All is good as long we we use only one replica.

mutantcornholio · 2023-05-30T14:43:02Z

I also realised that faucet stores its drips in a local, non-persistent (!) sqlite.
It needs them to check the daily/hourly quotas. Also needs to be addressed.

However, it's all simple stuff, isn't it?

mordamax · 2023-12-14T22:30:18Z

I can't find logs anymore unfortunately (it'd be great to save the snapshot in text format next time)
So I am not sure we deal with "failed deployments" correctly, by scaling to 2 instances
Do I understand right - if we set up properly livenessProbe & readinessProbe, then it should roll back the deployment if they don't pass? won't it be the proper fix for the deployment problem?

mutantcornholio self-assigned this May 26, 2023

mutantcornholio mentioned this issue Jun 21, 2023

[EngAut]: Faucet improvements (Q3 epic) #313

Closed

mutantcornholio mentioned this issue Jul 26, 2023

Support for Westmint / Assets #151

Open

mutantcornholio mentioned this issue Aug 10, 2023

Use persistent database for drip storage #335

Closed

mordamax added this to the Faucet Maintenance milestone Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale faucet #278

Scale faucet #278

mutantcornholio commented May 26, 2023

mordamax commented May 26, 2023

mutantcornholio commented May 26, 2023

PierreBesson commented May 29, 2023

mutantcornholio commented May 30, 2023

mutantcornholio commented May 30, 2023

chevdor commented May 30, 2023

mutantcornholio commented May 30, 2023 •

edited

Loading

mordamax commented May 30, 2023 •

edited

Loading

mutantcornholio commented May 30, 2023

chevdor commented May 30, 2023

mutantcornholio commented May 30, 2023

mordamax commented Dec 14, 2023

Scale faucet #278

Scale faucet #278

Comments

mutantcornholio commented May 26, 2023

mordamax commented May 26, 2023

mutantcornholio commented May 26, 2023

PierreBesson commented May 29, 2023

mutantcornholio commented May 30, 2023

mutantcornholio commented May 30, 2023

chevdor commented May 30, 2023

mutantcornholio commented May 30, 2023 • edited Loading

mordamax commented May 30, 2023 • edited Loading

mutantcornholio commented May 30, 2023

chevdor commented May 30, 2023

mutantcornholio commented May 30, 2023

mordamax commented Dec 14, 2023

mutantcornholio commented May 30, 2023 •

edited

Loading

mordamax commented May 30, 2023 •

edited

Loading