-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JAX in Kueue using training-operator 1.9 #4073
Comments
cc @mbobrovskyi @mszadkow |
I'm fine with supporting JAXJob in the Kueue! Just FYI: we did not confirm if JAXJob is compatible with TPU 😞 |
IIUC, you haven't tested, but it does not indicate it is not working? I don't expect much of a difference between GPU and TPU, but without testing it remains unknown. In any case I would be ok to claim the support for the GPUs. |
Actually, we did not have any verifications for both GPU and TPU. We verified only CPU |
I see, getting some confirmation that the controller works with GPU or TPU would be great. |
What would you like to be added:
Support for the recently added JAX in the training-operator, see kubeflow/training-operator#1619 and kubeflow/training-operator#1619.
Why is this needed:
For completeness of the support in Kueue. Also, the support for JAX was long-awaited in the training-operator, and there will be a community of people willing to use it with Kueue.
The text was updated successfully, but these errors were encountered: