Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jcs/update hsds confg #73

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Jcs/update hsds confg #73

wants to merge 2 commits into from

Conversation

jcschaff
Copy link
Contributor

@jcschaff jcschaff commented Aug 28, 2023

The intent is to improve the stability of the HSDS service (and cluster as a whole).

Explicitly supplying container resource specifications is considered best practice, and can help avoid oversubscribed nodes which could starve services during heavy use (e.g. maybe stressing the HSDS services).

  1. minor change to HSDS configuration (using new data-node K8s selector):
    k8s_dn_label_selector: app=hsds
    instead of k8s_dn_label_selector: app=hsds (which is deprecated) ... both parameters are included for compatibility.

  2. K8s container resource specs are now supplied for HSDS and all biosim services. All limits below are newly introduced unless otherwise noted.

  • hsds-sn (HSDS service node)

    • resources.requests.memory: 1Gi
    • resources.requests.cpu: 500m
    • resources.limits.memory: 1Gi
    • resources.limits.cpu: 1000m
  • hsds-dn (HSDS data node)

    • resources.requests.memory: 2Gi
    • resources.requests.cpu: 500m
    • resources.limits.memory: 2Gi
    • resources.limits.cpu: 1000m
  • combine-api

    • resources.requests.memory: 1Gi (unchanged)
    • resources.requests.cpu: 500m. (was 25m)
    • resources.limits.memory: 2Gi (unchanged)
    • resources.limits.cpu: 1000m (unchanged)
  • account-api

    • resources.requests.memory: 500Mi
    • resources.requests.cpu: 200m
    • resources.limits.memory: 1Gi
    • resources.limits.cpu: 500m
  • simulators-api

    • resources.requests.memory: 1Gi
    • resources.requests.cpu: 500m
    • resources.limits.memory: 2Gi
    • resources.limits.cpu: 1000m

@jcschaff jcschaff added enhancement New feature or request deployment deployment updates labels Aug 28, 2023
@jcschaff jcschaff self-assigned this Aug 28, 2023
@bilalshaikh42
Copy link
Member

The only concern with limits is that the Kuernetes behavior is to kill the pod if it exceeds the memory limits, even when there are resources available. This was the reason I did not include the limits in the first place.
It might make sense to set the requests and the limits to the same value to try and a get a Garunteed QOS for the pods instead, but this comes with the same fundamental limitations (pods being killed for exceeding limits) which might just increase the instability

These configurations were very much just a guess and check before I fully understood how this worked, so there is definitely a need for more tweaking and experimentation

@jcschaff
Copy link
Contributor Author

The only concern with limits is that the Kuernetes behavior is to kill the pod if it exceeds the memory limits, even when there are resources available. This was the reason I did not include the limits in the first place. It might make sense to set the requests and the limits to the same value to try and a get a Garunteed QOS for the pods instead, but this comes with the same fundamental limitations (pods being killed for exceeding limits) which might just increase the instability

These configurations were very much just a guess and check before I fully understood how this worked, so there is definitely a need for more tweaking and experimentation

@bilalshaikh42 thanks for your perspective. I was aware of the pods getting killed if they exceeded the limits, but was not aware of the Guaranteed QOS ... will look into it. I'll investigate further before I merge.

@jcschaff jcschaff force-pushed the jcs/update-hsds-confg branch from 7096b95 to 0bbd8c2 Compare September 29, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment deployment updates enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants