bug(pg): Demonstrating (or trying to) bug with many crons. #102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This demonstrates an issue where it is easily possible to have more jobs or job handlers than the connection pool will allow. In tests the max conns defaults to 2 but otherwise pgxconn defaults to
runtime.NumCPU
.I'm running into an issue where I'm running several small containers with sometimes less than a single CPU core each (since I don't need much). But because of how the connection pool works this ends up causing deadlocks since the acquire function calls themselves do not have timeouts (since ctx is just cascaded throughout).
More details can be seen here about my specific use case: monetr/monetr#1608
I guess overall, does it make sense to have a connection pool that has an arbitrary hard limit on connections? Even if someone were to just configure a higher limit (or maybe a significantly higher limit); most of the connections will be consumed not by jobs themselves but by the listeners associated with those jobs? Which means that you must have at least
N
max conns forN
jobs?The other thing would be how ctx is cascaded throughout, if I call
Start
orStartCron
I cannot pass aWithTimeout
context because that will cause actual parts of the handler or listener to timeout (like listening for notifications which is async anyway) where I might want to specify "the setup of this job should only take 10 seconds"?I think the approach with how timeouts should happen with async work happening in the background vs work that is synchronous initiated by a caller should be thought about as part of this as well.