Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for Hikari connection pool for database evolutions #480

Merged
merged 1 commit into from
Aug 21, 2024

Conversation

rebecca-thompson
Copy link
Contributor

@rebecca-thompson rebecca-thompson commented Aug 21, 2024

Out Typerighter PROD deployments have been failing. Upon investigation, the last instance in the auto-scaling group doesn't pass health checks, failing with the following exception:

org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections

The PROD RDS database is currently a db.t4g.micro which has max_connections of 81, with at least 5 of those connections reserved for internal Postgres processes.

The max pool size for database operations is set to 7, so you would think even when doubling the auto-scaling group to 6 instances during deployment, we would be under the threshold. However the app, also uses applicationEvolutions to run database evolutions when the app starts. This uses a separate Hikari connection pool which defaults to 10. The initial connections per instance then becomes up to 17 which is TOO MANY :)

What does this change?

This sets the Hikari connection pool to something small and sensible, given the size of the database.

How to test

When deployed to CODE we should see the number of connections in the RDS dashboard not spike as highly during deployment. The total number of connections should also decrease. The Rule Manager app should still come up and work as expected (https://manager.typerighter.code.dev-gutools.co.uk/) - we haven't touched the connection pool size for regular database operations, so performance of the app should be unaffected.

How can we measure success?

PROD deploys no longer fail

@rebecca-thompson rebecca-thompson requested a review from a team as a code owner August 21, 2024 09:22
Copy link
Contributor

@jonathonherbert jonathonherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a convincing explanation – do we know what changed that suddenly meant our new instances were exhausting the connection pool?

@rebecca-thompson
Copy link
Contributor Author

This is a convincing explanation – do we know what changed that suddenly meant our new instances were exhausting the connection pool?

Honestly not sure. There were some failed PROD deployments in June that correspond to high db connections and then it seemed to settle down until last week when deploys consistently started failing. From the graphs it looks like sometimes the app holds onto the connections for long enough to fail the deploy and other times they don't.

Screenshot 2024-08-21 at 10 43 29

@rebecca-thompson rebecca-thompson merged commit 6f475da into main Aug 21, 2024
4 checks passed
@rebecca-thompson rebecca-thompson deleted the bt/add-hikari-cp-config branch August 21, 2024 09:47
@prout-bot
Copy link

Seen on Rule Manager (merged by @rebecca-thompson 10 minutes and 34 seconds ago) Please check your changes!

@prout-bot
Copy link

Overdue on Checker (merged by @rebecca-thompson 15 minutes and 3 seconds ago) What's gone wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants