-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKS example is not working #37
Comments
@edonD thank you for bringing this to my attention, and apologies that you wasted time having to debug this. To implement this back in the EKS example project here, one can add a block to the 'daskhub.yaml' file so that the beginning of the file looks like this:
I have also gone through the solution and updated the software packages to the latest versions of EKS, eksctl, etc. Pull requests pending before these changes are all merged back into the solution. |
Wow, impressed with the fast feedback from you guys. I will try it out and let you know. Thank you nonetheless! |
The configuration is smooth and works perfectly. The only problem which I can see until now is that it somehow doesnt create the gateway cluster. I am using the cmip6_zarr.ipynb example. It gets stuck in the
I tried to initiate it with less workers but it is still the same. I guess this can have many reasons so if it works on your setup I can spend some more time and try to debugg it. |
If you got as far as logging into the notebook, then that means that at least the minimum number of EC2 instances were provisioned correctly. At the step that you are running into issues with, it requires additional EC2 instances to be created . The way it works is that dask worker pods are "scheduled", and then because the cluster will not have enough room to fit all of the scheduled pods, the cluster autoscaler will step in and create additional nodes for those pods to get scheduled on. So the first thing to check is whether all of those dask work pods are scheduled or not. If they are scheduled, then the next thing to check is whether the cluster autoscaler was installed correctly such that it is trying to create more EC2 instances. The process of instantiating more EC2 instances can take a while, sometimes 5-10 minutes. And here is the other thing to keep in mind. The default configuration for this solution uses Spot Intances for the worker nodes. Spot instances are not always available! So in some cases, you could find that your EC2 instances are not instantiating simply because there are not enough instances in your region/AZ, and the solution would work if you tried it the next day. Let me know what you find, hoping to help you get this working. |
I have tried many times now to implement the EKS example but it is not working. After following all the steps and siginin in with the username and password the jupyterhub is just stuck. It shows 0 on the console log and then after 5 minutes 100 but failed.
The text was updated successfully, but these errors were encountered: