-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hasura] further fine tuning of scaling and health check config #4165
Conversation
@@ -44,7 +44,8 @@ config: | |||
secure: AAABANHLs3ItPxkteh0chwMP2bKuHO3ovuRLi4FsIrCqerzXVIaTLFDqNR+4KBTeMPz4cnF5tCTwsrJv9GruZdXU+lg= | |||
application:hasura-proxy-cpu: "512" | |||
application:hasura-proxy-memory: "1024" | |||
application:hasura-scaling-maximum: "2" | |||
application:hasura-service-scaling-minimum: "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it explicit that this is something we can adjust as the need arises
timeout: 3, | ||
// use wget since busybox applet is included in Alpine base image (curl is not) | ||
command: ["CMD-SHELL", `wget --spider --quiet http://localhost:${HASURA_PROXY_PORT}/healthz || exit 1`], | ||
// generous config; if hasura is saturated/blocking, we give service a chance to scale out before whole task is replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale out only happens if CPU usage exceeds 30% for 3 minutes in a row (default setup for the pre-defined metrics used below), but if the ALB considers the task unhealthy in advance of that, it will replace it before this can happen!
minCapacity: 1, | ||
maxCapacity: parseInt(config.require("hasura-service-scaling-maximum")), | ||
// minCapacity should reflect the baseline load expected | ||
// see: https://hasura.io/docs/2.0/deployment/performance-tuning/#scalability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note in particular their formula:
total_nodes = required_ccu / requests_per_node + backup_node
translation for our case: 'requests per node' capacity is ~ 500 concurrent users (doing heavy stuff like pulling down published flow data), so if we wanted to serve 2000 such users (which we may at some future juncture), by Hasura's count we should be running 5 nodes at minimum (4 + 1 for good measure)
Removed vultr server and associated DNS entries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always - really appreciate the helpful comments here 👌
Merged to |
Relevant Trello ticket: https://trello.com/c/oRzedbmJ
Refinements to #4161 after further testing - see that PR / this doc for discussion.