-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the naming conventions for Kubeflow Trainer #2415
Update the naming conventions for Kubeflow Trainer #2415
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
" if local_rank == 0:\n", | ||
" # Only download the dataset from local rank 0\n", | ||
" dataset = datasets.FashionMNIST(\n", | ||
" \"./data\",\n", | ||
" train=True,\n", | ||
" download=True,\n", | ||
" transform=transforms.Compose([transforms.ToTensor()]),\n", | ||
" )\n", | ||
" dist.barrier()\n", | ||
" else:\n", | ||
" # Wait for local rank 0 to complete downloading the dataset and load it\n", | ||
" dist.barrier()\n", | ||
" dataset = datasets.FashionMNIST(\n", | ||
" \"./data\",\n", | ||
" train=True,\n", | ||
" download=False,\n", | ||
" transform=transforms.Compose([transforms.ToTensor()]),\n", | ||
" )\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the dataset logic from this example for now. Since this Notebook demonstrates single proc per Node we should be good.
I think, we can find more elegant way to download data only on LOCAL_RANK=0 process.
WDYT @astefanutti ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I agree we can revisit this and find a better approach.
786f92e
to
fbc6cb9
Compare
@andreyvelich We may need to rebase this PR since #2413 changed the API Group. |
Signed-off-by: Andrey Velichkevich <[email protected]>
fbc6cb9
to
0cf4ff6
Compare
It should be good now. |
Signed-off-by: Andrey Velichkevich <[email protected]>
@andreyvelich Looks good to me. Not strictly related to the scope of this PR but I see you've fixed the webhooks paths already :) |
@tenzen-y @Electronic-Waste @astefanutti Any thoughts what I forgot to fix to resolve problem with the Webhook ? |
Signed-off-by: Andrey Velichkevich <[email protected]>
It looks like Webhook paths, should have the correct address. |
It looks correct to me, according to what controller-runtime generates: https://github.com/kubernetes-sigs/controller-runtime/blob/bbc9711d9b2db31ce828714e7d3c875c00b4c1f1/pkg/builder/webhook.go#L287 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just some minor comments
Makefile
Outdated
@@ -102,7 +102,7 @@ endif | |||
# Instructions to run tests. | |||
.PHONY: test | |||
test: ## Run Go unit test. | |||
go test ./pkg/apis/trainer/v2alpha1/... ./pkg/controller.v2/... ./pkg/runtime.v2/... ./pkg/webhooks.v2/... ./pkg/util.v2/... -coverprofile cover.out | |||
go test ./pkg/apis/trainer/v1alpha1/... ./pkg/controller/... ./pkg/runtime/... ./pkg/webhooks/... ./pkg/util/... -coverprofile cover.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go test ./pkg/apis/trainer/v1alpha1/... ./pkg/controller/... ./pkg/runtime/... ./pkg/webhooks/... ./pkg/util/... -coverprofile cover.out | |
go test $(shell go list ./... | grep -v '/test/') -coverprofile cover.out |
Currently, all Go files should be tested. When I set up this, we have 2 version codes (v1 and v2). So, I specify the dedicated directory.
runtime "github.com/kubeflow/training-operator/pkg/runtime.v2" | ||
runtimecore "github.com/kubeflow/training-operator/pkg/runtime.v2/core" | ||
webhooksv2 "github.com/kubeflow/training-operator/pkg/webhooks.v2" | ||
kubeflowv1 "github.com/kubeflow/trainer/pkg/apis/trainer/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubeflowv1 "github.com/kubeflow/trainer/pkg/apis/trainer/v1alpha1" | |
kubeflow "github.com/kubeflow/trainer/pkg/apis/trainer/v1alpha1" |
Now, we can just say "kubeflow".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y Can we keep the kubeflowv1
for now since in the future, we might want to introduce another APIs: v1alpha2
and we should distinguish it in the controller ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we may need kubeflowv1alpha1
to distinguish versions like v1alpha2 from v1alpha1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind do it in the follow-up PR, given the number of changes this PR already has ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,I would like to help this:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using kubeflow
since we should make the import name shorter as much as possible.
Typically, we use kubeflow
or trainer
for the latest internal API version.
After we graduate v1alpha1 to v1beta1, and then introduce new API with v1alpha1, we call the new alpha API as kubeflowv1alpha1
or trainerv1alpha1
.
We can imagine the situation where Kube impl uses core
as import name, but, external projects use corev1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use trainer
to be more explicit here.
runtimecore "github.com/kubeflow/training-operator/pkg/runtime.v2/core" | ||
webhooksv2 "github.com/kubeflow/training-operator/pkg/webhooks.v2" | ||
kubeflowv1 "github.com/kubeflow/trainer/pkg/apis/trainer/v1alpha1" | ||
kubeflowcontroller "github.com/kubeflow/trainer/pkg/controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubeflowcontroller "github.com/kubeflow/trainer/pkg/controller" | |
"github.com/kubeflow/trainer/pkg/controller" |
Have you seen any conflicts with "controller"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it conflicts with this pkg: https://github.com/kubeflow/training-operator/blob/91bd7ef89a76bda71f1df324485b46b905a2c2b3/cmd/trainer-controller-manager/main.go#L32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we typically use ctrl "sigs.k8s.io/controller-runtime/pkg/controller"
.
In go, the package name should be shorter as much as possible. If the shorter name is popular, we can use the shorter name like controller -> ctrl
, format -> fmt
.
Signed-off-by: Andrey Velichkevich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM. Just a few comments for you @andreyvelich
"github.com/kubeflow/trainer/pkg/runtime" | ||
runtimecore "github.com/kubeflow/trainer/pkg/runtime/core" | ||
"github.com/kubeflow/trainer/pkg/util/cert" | ||
webhooks "github.com/kubeflow/trainer/pkg/webhooks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want to distinguish different versions of webhooks just like APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtimecore "github.com/kubeflow/training-operator/pkg/runtime.v2/core" | ||
webhooksv2 "github.com/kubeflow/training-operator/pkg/webhooks.v2" | ||
kubeflowv1 "github.com/kubeflow/trainer/pkg/apis/trainer/v1alpha1" | ||
kubeflowcontroller "github.com/kubeflow/trainer/pkg/controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for the controller. Do we want to distinguish different versions of controllers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ "For example: [v1.7.0](https://github.com/kubeflow/training-operator/releases/tag/v1.7.0)", | ||
+ "For example: [v1.7.0](https://github.com/kubeflow/trainer/releases/tag/v1.7.0)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we enabled url redirection? If so, we may not need to change this url since v1.7.0
is the release of training-operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, once we rename the repo GitHub automatically add the redirects it.
Signed-off-by: Andrey Velichkevich <[email protected]>
@tenzen-y @Electronic-Waste I renamed |
Great to see, thanks. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Part of: #2170
This PR updates the naming conventions for Kubeflow Trainer.
Also, it changes CRDs to
v1alpha1
.Please give me your thoughts on the naming if that looks good.
Kubeflow Trainer Images:
Kubeflow Trainer k8s resources:
TODOs:
/assign @tenzen-y @kubeflow/wg-training-leads @Electronic-Waste @saileshd1402 @astefanutti @franciscojavierarceo @saileshd1402 @seanlaii @kannon92