fix(sdk): resolve errors in deserialization #2457

Electronic-Waste · 2025-02-28T16:50:10Z

What this PR does / why we need it:

This PR fixes the error that occurred in deserialization (sdk/kubeflow/trainer/api_client.py)

│ Traceback (most recent call last):                                                                                                                                                                       │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 103, in list_runtimes
│     runtime = self.api_client.deserialize(                                                                                                                                                               │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                               │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 280, in deserialize                                                                                                │     return self.__deserialize(data, response_type)
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                       │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize                                                                                              │     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │     kwargs[attr] = self.__deserialize(value, attr_type)                                                                                                                                                  │                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
│     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                         │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │r    kwargs[attr] = self.__deserialize(value, attr_type)
│                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize                                                                                              │     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │     kwargs[attr] = self.__deserialize(value, attr_type)                                                                                                                                                  │                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
│     return self.__deserialize_model(data, klass)
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
│     kwargs[attr] = self.__deserialize(value, attr_type)
│                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 308, in __deserialize
│     klass = getattr(kubeflow.trainer.models, klass)
│             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│ AttributeError: module 'kubeflow.trainer.models' has no attribute 'K8sIoApimachineryPkgUtilIntstrIntOrString'
│
│ During handling of the above exception, another exception occurred:
│
│ Traceback (most recent call last):
│   File "/kubeflow-trainer/torch.py", line 3, in <module>
│     for r in TrainerClient().list_runtimes():
│              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 160, in list_runtimes
│     raise RuntimeError(
│ RuntimeError: Failed to list ClusterTrainingRuntimes in namespace: kubeflow-system
│ Stream closed EOF for kubeflow-system/zedd-trainer-pod (trainer-container)

I make these changes:

Add kubernetes.client import in hack/python-sdk/gen-sdk.sh
Make type conversion in swagger.json

/cc @kubeflow/wg-training-leads @astefanutti

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

Docs included if any changes are user facing

…json. Signed-off-by: Electronic-Waste <[email protected]>

astefanutti

LGTM, awesome!

Electronic-Waste · 2025-02-28T16:55:20Z

sdk/kubeflow/trainer/models/__init__.py

-# Import JobSet models for the serialization. It imports the Kubernetes models.
+# Import Kubernetes and JobSet models for the serialization. 
+from kubernetes.client import *
 from jobset.models import *


JobSet does not import Kubernetes models: https://github.com/kubernetes-sigs/jobset/blob/main/sdk/python/jobset/models/__init__.py

andreyvelich · 2025-02-28T17:03:32Z

hack/python-sdk/swagger_config.json

@@ -1,6 +1,8 @@
 {
  "packageName": "kubeflow.trainer",
  "typeMappings": {
+    "K8sIoApiAutoscalingV2MetricSpec": "V2MetricSpec",
+    "K8sIoApimachineryPkgUtilIntstrIntOrString": "Union[int, str]",


Should we use Union[int, str] or object here since Union is not a standard type ?

I've tried both options. And I found that Union[int, str] cannot be deserialized since it's not a standard type:

Traceback (most recent call last): File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 103, in list_runtimes runtime = self.api_client.deserialize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 280, in deserialize return self.__deserialize(data, response_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize return self.__deserialize_model(data, klass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model kwargs[attr] = self.__deserialize(value, attr_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize return self.__deserialize_model(data, klass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model kwargs[attr] = self.__deserialize(value, attr_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize return self.__deserialize_model(data, klass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model kwargs[attr] = self.__deserialize(value, attr_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize return self.__deserialize_model(data, klass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model kwargs[attr] = self.__deserialize(value, attr_type) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 308, in __deserialize klass = getattr(kubeflow.trainer.models, klass) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'kubeflow.trainer.models' has no attribute 'Union[int, str]' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ws/kubeflow/trainer-example/list_runtime.py", line 4, in <module> for r in TrainerClient().list_runtimes(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 160, in list_runtimes raise RuntimeError( RuntimeError: Failed to list ClusterTrainingRuntimes in namespace: default

In this case, I think we should use object like what Kubernetes Python client did in: kubernetes-client/python#366

And it works well

$ python list_runtime.py Runtime: mpi-distributed Runtime: torch-distributed

@andreyvelich Sorry for the confusion.

@Electronic-Waste Yes, please let's use object there.

Signed-off-by: Electronic-Waste <[email protected]>

andreyvelich

Thanks @Electronic-Waste!
/lgtm
/approve

google-oss-prow · 2025-03-02T21:10:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fix(sdk): import kubernetes.client & make type conversion in swagger.…

c3773b3

…json. Signed-off-by: Electronic-Waste <[email protected]>

google-oss-prow bot requested review from astefanutti and a team February 28, 2025 16:50

google-oss-prow bot added the size/M label Feb 28, 2025

astefanutti approved these changes Feb 28, 2025

View reviewed changes

Electronic-Waste commented Feb 28, 2025

View reviewed changes

andreyvelich reviewed Feb 28, 2025

View reviewed changes

fix(sdk): change Union[int, str] to object.

3697e01

Signed-off-by: Electronic-Waste <[email protected]>

Electronic-Waste force-pushed the fix/deserialization branch from 2aff571 to 3697e01 Compare March 1, 2025 04:19

andreyvelich reviewed Mar 2, 2025

View reviewed changes

google-oss-prow bot assigned andreyvelich Mar 2, 2025

google-oss-prow bot added the lgtm label Mar 2, 2025

google-oss-prow bot added the approved label Mar 2, 2025

google-oss-prow bot merged commit a6b4840 into kubeflow:master Mar 2, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sdk): resolve errors in deserialization #2457

fix(sdk): resolve errors in deserialization #2457

Electronic-Waste commented Feb 28, 2025

astefanutti left a comment

Electronic-Waste Feb 28, 2025

andreyvelich Feb 28, 2025

Electronic-Waste Mar 1, 2025

Electronic-Waste Mar 1, 2025

andreyvelich Mar 2, 2025

andreyvelich left a comment

google-oss-prow bot commented Mar 2, 2025

fix(sdk): resolve errors in deserialization #2457

fix(sdk): resolve errors in deserialization #2457

Conversation

Electronic-Waste commented Feb 28, 2025

astefanutti left a comment

Choose a reason for hiding this comment

Electronic-Waste Feb 28, 2025

Choose a reason for hiding this comment

andreyvelich Feb 28, 2025

Choose a reason for hiding this comment

Electronic-Waste Mar 1, 2025

Choose a reason for hiding this comment

Electronic-Waste Mar 1, 2025

Choose a reason for hiding this comment

andreyvelich Mar 2, 2025

Choose a reason for hiding this comment

andreyvelich left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Mar 2, 2025