Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore SuperCDMS JupyterHub from scratch #90

Open
zonca opened this issue Oct 10, 2023 · 16 comments
Open

Restore SuperCDMS JupyterHub from scratch #90

zonca opened this issue Oct 10, 2023 · 16 comments
Assignees

Comments

@zonca
Copy link
Collaborator

zonca commented Oct 10, 2023

Instances of the old deployment are still on Jetstream 2 in "Shelved Offloaded" state.
However we prefer to deploy from scratch, as we had some issues in the old deployment, and better get the latest improvements.

@zonca
Copy link
Collaborator Author

zonca commented Oct 10, 2023

@zonca will check status of allocation, do a test deployment and destroy it, then point Amy to the docs so she can execute the deployment.

We will start with a plain Kubernetes deploy, then will add all other features, in particular #84

@zonca
Copy link
Collaborator Author

zonca commented Oct 11, 2023

testing the deployment, until Terraform step it works, however by mistake I released the IP pointed by supercdms.jetstream-cloud.org, opened ticket.

Now testing Ansible default deployment of Kubernetes. See also #92

@zonca zonca self-assigned this Oct 11, 2023
@zonca
Copy link
Collaborator Author

zonca commented Oct 14, 2023

taking the opportunity of rebuilding the deployment to improve it.
I am working on deploying a load balancer on top of JupyterHub, see zonca#45

@zonca
Copy link
Collaborator Author

zonca commented Oct 14, 2023

I have a networking issue with the load balancer, opened a ticket about it.

@zonca
Copy link
Collaborator Author

zonca commented Oct 17, 2023

still debugging with the help of Jetstream's support

@pibion
Copy link
Member

pibion commented Oct 20, 2023

@zonca should I still give the instructions a try, or wait until the load balancer is debugged?

@zonca
Copy link
Collaborator Author

zonca commented Oct 20, 2023

You can try as it is now

@zonca
Copy link
Collaborator Author

zonca commented Oct 20, 2023

issue fixed by Jetstream support. Now I'll write a tutorial about this, then resume deploying the SuperCDMS JupyterHub

@zonca
Copy link
Collaborator Author

zonca commented Nov 6, 2023

I am finalizing the load balancer tutorial

@zonca
Copy link
Collaborator Author

zonca commented Nov 17, 2023

hit other issues with the load balancer, waiting for help from jetstream support

@pibion
Copy link
Member

pibion commented Jan 10, 2024

So I started trying to redeploy - is that okay in the current state? I went to https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/blob/cdms/DEPLOY.md and tried to follow the "REDEPLOY.md" instructions, but I'm not sure which prompt I should be using. I'm guessing it's a prompt on jetstream?

@zonca
Copy link
Collaborator Author

zonca commented Jan 12, 2024

@pibion I'll tag you on other issues for this, see the other notifications

@zonca
Copy link
Collaborator Author

zonca commented Mar 17, 2024

@pibion I restarted working on this,
deploying a load-balancer is not reliable, so I am going to deploy it without a load balancer, as it was before.

Recently we added support for clusters with both CPU and GPU nodes at the same time: https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2

So I am deploying a test cluster with 1 CPU and 1 GPU node.

I'll notify here when it is available for testing.

@zonca
Copy link
Collaborator Author

zonca commented Mar 17, 2024

ok,
I have a preliminary version of the deployment at:

https://kubejetstream-1.phy210008.projects.jetstream-cloud.org/

I am using a temporary URL for now, we will put it under supercdms.jetstream-cloud.org later.

For now I just deployed a plain JupyterHub with a Tensorflow image with GPU support and Gitlab auth.
No data sharing volumes for now.

@pibion is it ok if we talk about next steps on Tuesday before or after the kaitai call?

@zonca
Copy link
Collaborator Author

zonca commented Mar 26, 2024

next step is to try use one of the Singularity images, see #84

@zonca
Copy link
Collaborator Author

zonca commented Dec 5, 2024

@pibion we have a deployment here, but no-one is using it:

Image

should I turn it off? Should I go back working on Singularity? (#84)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants