[hasura] further fine tuning of scaling and health check config #4165

freemvmt · 2025-01-17T17:54:35Z

Relevant Trello ticket: https://trello.com/c/oRzedbmJ

Refinements to #4161 after further testing - see that PR / this doc for discussion.

freemvmt · 2025-01-17T17:55:01Z

infrastructure/application/Pulumi.staging.yaml

@@ -44,7 +44,8 @@ config:
    secure: AAABANHLs3ItPxkteh0chwMP2bKuHO3ovuRLi4FsIrCqerzXVIaTLFDqNR+4KBTeMPz4cnF5tCTwsrJv9GruZdXU+lg=
  application:hasura-proxy-cpu: "512"
  application:hasura-proxy-memory: "1024"
-  application:hasura-scaling-maximum: "2"
+  application:hasura-service-scaling-minimum: "1"


Making it explicit that this is something we can adjust as the need arises

freemvmt · 2025-01-17T17:57:00Z

infrastructure/application/services/hasura.ts

-            timeout: 3,
+            // use wget since busybox applet is included in Alpine base image (curl is not)
+            command: ["CMD-SHELL", `wget --spider --quiet http://localhost:${HASURA_PROXY_PORT}/healthz || exit 1`],
+            // generous config; if hasura is saturated/blocking, we give service a chance to scale out before whole task is replaced


scale out only happens if CPU usage exceeds 30% for 3 minutes in a row (default setup for the pre-defined metrics used below), but if the ALB considers the task unhealthy in advance of that, it will replace it before this can happen!

freemvmt · 2025-01-17T17:59:45Z

infrastructure/application/services/hasura.ts

-    minCapacity: 1,
+    maxCapacity: parseInt(config.require("hasura-service-scaling-maximum")),
+    // minCapacity should reflect the baseline load expected
+    // see: https://hasura.io/docs/2.0/deployment/performance-tuning/#scalability


note in particular their formula:

total_nodes = required_ccu / requests_per_node + backup_node

translation for our case: 'requests per node' capacity is ~ 500 concurrent users (doing heavy stuff like pulling down published flow data), so if we wanted to serve 2000 such users (which we may at some future juncture), by Hasura's count we should be running 5 nodes at minimum (4 + 1 for good measure)

github-actions · 2025-01-17T18:17:18Z

Removed vultr server and associated DNS entries

DafyddLlyr

As always - really appreciate the helpful comments here 👌

DafyddLlyr · 2025-01-20T09:20:43Z

Merged to main in order to be picked up on next prod deploy

[hasura] further fine tuning of scaling and health check config

2166e97

freemvmt requested a review from a team January 17, 2025 17:54

freemvmt commented Jan 17, 2025

View reviewed changes

DafyddLlyr approved these changes Jan 18, 2025

View reviewed changes

DafyddLlyr merged commit 854b698 into main Jan 20, 2025
12 checks passed

DafyddLlyr deleted the hasura-polish-2 branch January 20, 2025 09:20

DafyddLlyr mentioned this pull request Jan 20, 2025

Production deploy #4167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hasura] further fine tuning of scaling and health check config #4165

[hasura] further fine tuning of scaling and health check config #4165

freemvmt commented Jan 17, 2025

freemvmt Jan 17, 2025

freemvmt Jan 17, 2025

freemvmt Jan 17, 2025

github-actions bot commented Jan 17, 2025 •

edited

Loading

DafyddLlyr left a comment

DafyddLlyr commented Jan 20, 2025

[hasura] further fine tuning of scaling and health check config #4165

[hasura] further fine tuning of scaling and health check config #4165

Conversation

freemvmt commented Jan 17, 2025

freemvmt Jan 17, 2025

Choose a reason for hiding this comment

freemvmt Jan 17, 2025

Choose a reason for hiding this comment

freemvmt Jan 17, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 17, 2025 • edited Loading

DafyddLlyr left a comment

Choose a reason for hiding this comment

DafyddLlyr commented Jan 20, 2025

github-actions bot commented Jan 17, 2025 •

edited

Loading