Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hasura] further fine tuning of scaling and health check config #4165

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

freemvmt
Copy link
Contributor

Relevant Trello ticket: https://trello.com/c/oRzedbmJ

Refinements to #4161 after further testing - see that PR / this doc for discussion.

@freemvmt freemvmt requested a review from a team January 17, 2025 17:54
@@ -44,7 +44,8 @@ config:
secure: AAABANHLs3ItPxkteh0chwMP2bKuHO3ovuRLi4FsIrCqerzXVIaTLFDqNR+4KBTeMPz4cnF5tCTwsrJv9GruZdXU+lg=
application:hasura-proxy-cpu: "512"
application:hasura-proxy-memory: "1024"
application:hasura-scaling-maximum: "2"
application:hasura-service-scaling-minimum: "1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it explicit that this is something we can adjust as the need arises

timeout: 3,
// use wget since busybox applet is included in Alpine base image (curl is not)
command: ["CMD-SHELL", `wget --spider --quiet http://localhost:${HASURA_PROXY_PORT}/healthz || exit 1`],
// generous config; if hasura is saturated/blocking, we give service a chance to scale out before whole task is replaced
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scale out only happens if CPU usage exceeds 30% for 3 minutes in a row (default setup for the pre-defined metrics used below), but if the ALB considers the task unhealthy in advance of that, it will replace it before this can happen!

minCapacity: 1,
maxCapacity: parseInt(config.require("hasura-service-scaling-maximum")),
// minCapacity should reflect the baseline load expected
// see: https://hasura.io/docs/2.0/deployment/performance-tuning/#scalability
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note in particular their formula:

total_nodes = required_ccu / requests_per_node + backup_node

translation for our case: 'requests per node' capacity is ~ 500 concurrent users (doing heavy stuff like pulling down published flow data), so if we wanted to serve 2000 such users (which we may at some future juncture), by Hasura's count we should be running 5 nodes at minimum (4 + 1 for good measure)

Copy link

github-actions bot commented Jan 17, 2025

Removed vultr server and associated DNS entries

Copy link
Contributor

@DafyddLlyr DafyddLlyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always - really appreciate the helpful comments here 👌

@DafyddLlyr
Copy link
Contributor

Merged to main in order to be picked up on next prod deploy

@DafyddLlyr DafyddLlyr merged commit 854b698 into main Jan 20, 2025
12 checks passed
@DafyddLlyr DafyddLlyr deleted the hasura-polish-2 branch January 20, 2025 09:20
@DafyddLlyr DafyddLlyr mentioned this pull request Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants