-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Generate external Kubernetes and JobSet models #2466
feat(sdk): Generate external Kubernetes and JobSet models #2466
Conversation
for _, dep := range info.Deps { | ||
if dep.Path == "k8s.io/api" { | ||
k8sVersion = strings.Replace(dep.Version, "v0.", "v1.", -1) | ||
} else if dep.Path == "sigs.k8s.io/jobset" { | ||
jobSetVersion = dep.Version | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look good to fetch Kubernetes and JobSet version ?
162056f
to
a3f732f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Thanks for this. I left my initial comments for you.
hack/swagger/main.go
Outdated
// TODO (andreyvelich): Use the JobSet source when PR is merged. | ||
jobSetOpenAPISpec := "https://raw.githubusercontent.com/andreyvelich/jobset/refs/heads/fix-swagger-ref/hack/python-sdk/swagger.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use the JobSet source now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to create PR to update the JobSet swagger.
Will do it soon.
hack/python-sdk/gen-sdk.sh
Outdated
docker run --rm \ | ||
-v "${TRAINER_ROOT}:/local" docker.io/openapitools/openapi-generator-cli:${OPENAPI_GENERATOR_VERSION} generate \ | ||
-g python \ | ||
-i "local/${SWAGGER_CODEGEN_FILE}" \ | ||
-c "local/${SWAGGER_CODEGEN_CONF}" \ | ||
-o "local/${SDK_OUTPUT_PATH}" \ | ||
-p=packageVersion="${SDK_VERSION}" \ | ||
--global-property apiTests=false,modelTests=false # TODO (andreyvelich): Discuss if we should use these test files. | ||
--global-property models,modelTests=false,supportingFiles=__init__.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we use the container to generate Python SDK? Can we just use the jar file?
And also, Docker will put limit on the pull rate starting from 1st April. Is it good to download this image from Dockerhub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, using docker container is preferable to make generation platform agnostic: kubernetes-sigs/jobset#190
And also, Docker will put limit on the pull rate starting from 1st April. Is it good to download this image from Dockerhub?
I think, that should be fine for now. If we see problems in our CI, maybe we can mirror this image to Kubeflow GHCR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliusvonkohout @Electronic-Waste Do we know if this image is available in ECR gallery ?
https://gallery.ecr.aws/
docker.io/openapitools/openapi-generator-cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I think, that should be fine for now. If we see problems in our CI, maybe we can mirror this image to Kubeflow GHCR." Just make sure that you update it automatically and regularly to ghcr to fix CVEs.
a3f732f
to
0bd7876
Compare
0d27e69
to
d7a7abe
Compare
Also, I removed the Models docs from the OpenAPI generation for now. Does it look good ? |
d7a7abe
to
e0ae1dd
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
e0ae1dd
to
f844687
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
hack/e2e-setup-cluster.sh
Outdated
@@ -39,7 +39,14 @@ cd manifests/overlays/manager | |||
kustomize edit set image kubeflow/trainer-controller-manager=${CONTROLLER_MANAGER_CI_IMAGE} | |||
|
|||
echo "Create Kind cluster and load Kubeflow Trainer images" | |||
${KIND} create cluster --image "${KIND_NODE_VERSION}" | |||
cat <<EOF | ${KIND} create cluster --image "${KIND_NODE_VERSION}" --config=- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this config, so we can add more Kind workers in the future to decrease kube-scheduler time for placing Pods.
I think, that should speedup our tests once we run more TrainJobs at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you open a separate PR? Because this seems not to related to openapi-gen.
If we find any issue for Kind config adding, the dedicated PR allows us to easily revert and fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me open it.
Signed-off-by: Andrey Velichkevich <[email protected]>
@@ -36,11 +36,17 @@ PROJECT_DIR := $(shell dirname $(abspath $(lastword $(MAKEFILE_LIST)))) | |||
# Tool Binaries | |||
LOCALBIN ?= $(PROJECT_DIR)/bin | |||
|
|||
CONTROLLER_GEN ?= $(LOCALBIN)/controller-gen | |||
GINKGO ?= $(LOCALBIN)/ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y I noticed a new error in our e2e:
Will run 1 of 1 specs
panic: test timed out after 10m0s
running tests:
TestAPIs (10m0s)
goroutine 2578 [running]:
testing.(*M).startAlarm.func1()
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:2373 +0x385
created by time.goFunc
/opt/hostedtoolcache/go/1.23.0/x64/src/time/sleep.go:215 +0x2d
goroutine 1 [chan receive, 10 minutes]:
testing.(*T).Run(0xc0000f44e0, {0x1999495?, 0x0?}, 0x1aaae28)
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:1751 +0x3ab
testing.runTests.func1(0xc0000f44e0)
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:2168 +0x37
testing.tRunner(0xc0000f44e0, 0xc0000a9c70)
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:1690 +0xf4
testing.runTests(0xc00038e2b8, {0x29962d0, 0x1, 0x1}, {0x4[74](https://github.com/kubeflow/trainer/actions/runs/13681536090/job/38254906117?pr=2466#step:7:75)a25?, 0xc000687350?, 0x29de060?})
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:2166 +0x43d
testing.(*M).Run(0xc0002aae60)
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:2034 +0x64a
main.main()
_testmain.go:45 +0x9b
goroutine 72 [select, 10 minutes]:
github.com/onsi/ginkgo/v2/internal.(*Suite).runNode(_, {0x6, 0x4, {0x19e625f, 0x35}, 0xc0006942a0, {{0x200d249, 0x59}, 0x2d, {0x0, ...}, ...}, ...}, ...)
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:916 +0xfc5
github.com/onsi/ginkgo/v2/internal.(*group).attemptSpec(0xc000169810, 0x1, {{0xc0001bd208?, 0xc000251b90?, 0x2?}, 0x0?})
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/group.go:199 +0xc33
github.com/onsi/ginkgo/v2/internal.(*group).run(0xc000169810, {0xc000365280?, 0x0?, 0x0?})
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/group.go:349 +0x905
github.com/onsi/ginkgo/v2/internal.(*Suite).runSpecs(0xc0001d2a88, {0x19b143a, 0x1a}, {0x2a009c0, 0x0, 0x0}, {0xc000050064, 0x4d}, 0x0, {0xc0001460a0, ...})
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:489 +0xa9b
github.com/onsi/ginkgo/v2/internal.(*Suite).Run(_, {_, _}, {_, _, _}, {_, _}, _, {0x1c58f30, ...}, ...)
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:130 +0x405
github.com/onsi/ginkgo/v2.RunSpecs({0x1c363c0, 0xc00026[78](https://github.com/kubeflow/trainer/actions/runs/13681536090/job/38254906117?pr=2466#step:7:79)60}, {0x19b143a, 0x1a}, {0x0, 0x0, 0x0})
/home/runner/work/trainer/trainer/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/core_dsl.go:300 +0x893
github.com/kubeflow/trainer/test/e2e.TestAPIs(0xc000267860)
/home/runner/work/trainer/trainer/go/src/github.com/kubeflow/trainer/test/e2e/suite_test.go:39 +0x45
testing.tRunner(0xc000267[86](https://github.com/kubeflow/trainer/actions/runs/13681536090/job/38254906117?pr=2466#step:7:87)0, 0x1aaae28)
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:1690 +0xf4
created by testing.(*T).Run in goroutine 1
/opt/hostedtoolcache/go/1.23.0/x64/src/testing/testing.go:1743 +0x390
Since we use ginkgo to run our tests, can we use their binary to run tests similar to JobSet and Kueue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ginkgo binary would be more better
Signed-off-by: Andrey Velichkevich <[email protected]>
93b4b7f
to
bbe2d4b
Compare
Signed-off-by: Andrey Velichkevich <[email protected]>
Signed-off-by: Andrey Velichkevich <[email protected]>
The e2e seems to be working. I don't think flakiness relates to this PR. |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I upgraded the OpenAPI generator to the latest version using the Docker image, similar to JobSet:
docker.io/openapitools/openapi-generator-cli:v7.11.0
.In this version, we can serialize and deserialize objects directly within the Model classes, eliminating the need to use
FakeResponse
for serialization. However, it requires all external dependencies to be present in the models. For example,JobsetV1alpha2JobSet
usesV1ObjectMeta
from Kubernetes OpenAPI spec.Therefore, I suggest updating our
swagger.json
to reference external models from both Kubernetes and JobSet.TODO:
Please let me know what do you think ?
/assign @kubeflow/wg-training-leads @astefanutti @kannon92 @saileshd1402 @seanlaii @Electronic-Waste
/hold for review