Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to achieve successful binding rate beyond ~300 pods/sec #71

Open
rishabh325 opened this issue Jan 7, 2025 · 1 comment
Open

Comments

@rishabh325
Copy link

Not able to achieve successful binding rate beyond ~300 pods/sec. When running binder in active-active mode, getting high conflict rate while in active-passive mode overall binding rate is below ~300 pods/sec

Configurations:
Nodes: 30k
Pods: ~150k
Creation Rate: ~1.75k pods/sec via clusterloader

Case 1:

Service Leader Elected Instances Resource Limit
Dispatcher No 4 2 instances w/ 32 cores/180Gi
Scheduler No 2 2 instances w/ 32 cores/250Gi
Binder No 4 2 instances w/ 32 cores/180Gi
Screenshot 2025-01-07 at 11 30 07 AM .

Case 2:

Service Leader Elected Instances Resource Limit
Dispatcher No 4 2 instances w/ 32 cores/180Gi
Scheduler No 2 2 instances w/ 32 cores/250Gi
Binder Yes 2 2 instances w/ 32 cores/180Gi
Screenshot 2025-01-07 at 11 32 39 AM
@binacs
Copy link
Member

binacs commented Feb 2, 2025

Apologies for the temporary absence of a deployment guide for multi-instance setups, which may have caused confusion. We are working to improve this documentation as soon as possible.

In the architecture of the Godel distributed scheduler, only one dispatcher and one binder instance are expected to be active at any given time. Multiple dispatcher/binder instances are deployed for high availability and must utilize leader election to prevent conflicts.

For multiple scheduler instances, there are two possible scenarios:

  1. If multiple instances belong to the same shard, they should share the same scheduler name and enable leader election.
  2. If instances belong to different shards, each shard's instances should be assigned a unique scheduler name.

If you have any further questions, please feel free to continue the discussion in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants