Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk): resolve errors in deserialization #2457

Merged

Conversation

Electronic-Waste
Copy link
Member

What this PR does / why we need it:

This PR fixes the error that occurred in deserialization (sdk/kubeflow/trainer/api_client.py)

│ Traceback (most recent call last):                                                                                                                                                                       │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 103, in list_runtimes
│     runtime = self.api_client.deserialize(                                                                                                                                                               │               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                               │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 280, in deserialize                                                                                                │     return self.__deserialize(data, response_type)
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                       │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize                                                                                              │     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │     kwargs[attr] = self.__deserialize(value, attr_type)                                                                                                                                                  │                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
│     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                         │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │r    kwargs[attr] = self.__deserialize(value, attr_type)
│                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize                                                                                              │     return self.__deserialize_model(data, klass)                                                                                                                                                         │            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model                                                                                        │     kwargs[attr] = self.__deserialize(value, attr_type)                                                                                                                                                  │                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                  │   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
│     return self.__deserialize_model(data, klass)
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
│     kwargs[attr] = self.__deserialize(value, attr_type)
│                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 308, in __deserialize
│     klass = getattr(kubeflow.trainer.models, klass)
│             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│ AttributeError: module 'kubeflow.trainer.models' has no attribute 'K8sIoApimachineryPkgUtilIntstrIntOrString'
│
│ During handling of the above exception, another exception occurred:
│
│ Traceback (most recent call last):
│   File "/kubeflow-trainer/torch.py", line 3, in <module>
│     for r in TrainerClient().list_runtimes():
│              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│   File "/usr/local/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 160, in list_runtimes
│     raise RuntimeError(
│ RuntimeError: Failed to list ClusterTrainingRuntimes in namespace: kubeflow-system
│ Stream closed EOF for kubeflow-system/zedd-trainer-pod (trainer-container)

I make these changes:

  1. Add kubernetes.client import in hack/python-sdk/gen-sdk.sh
  2. Make type conversion in swagger.json

/cc @kubeflow/wg-training-leads @astefanutti

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@google-oss-prow google-oss-prow bot requested review from astefanutti and a team February 28, 2025 16:50
Copy link
Contributor

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome!

Comment on lines -46 to 48
# Import JobSet models for the serialization. It imports the Kubernetes models.
# Import Kubernetes and JobSet models for the serialization.
from kubernetes.client import *
from jobset.models import *
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,6 +1,8 @@
{
"packageName": "kubeflow.trainer",
"typeMappings": {
"K8sIoApiAutoscalingV2MetricSpec": "V2MetricSpec",
"K8sIoApimachineryPkgUtilIntstrIntOrString": "Union[int, str]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Union[int, str] or object here since Union is not a standard type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried both options. And I found that Union[int, str] cannot be deserialized since it's not a standard type:

Traceback (most recent call last):
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 103, in list_runtimes
    runtime = self.api_client.deserialize(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 280, in deserialize
    return self.__deserialize(data, response_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
    return self.__deserialize_model(data, klass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
    return self.__deserialize_model(data, klass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
    return self.__deserialize_model(data, klass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 319, in __deserialize
    return self.__deserialize_model(data, klass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 658, in __deserialize_model
    kwargs[attr] = self.__deserialize(value, attr_type)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api_client.py", line 308, in __deserialize
    klass = getattr(kubeflow.trainer.models, klass)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'kubeflow.trainer.models' has no attribute 'Union[int, str]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ws/kubeflow/trainer-example/list_runtime.py", line 4, in <module>
    for r in TrainerClient().list_runtimes():
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ws/miniconda3/envs/training-operator/lib/python3.11/site-packages/kubeflow/trainer/api/trainer_client.py", line 160, in list_runtimes
    raise RuntimeError(
RuntimeError: Failed to list ClusterTrainingRuntimes in namespace: default

In this case, I think we should use object like what Kubernetes Python client did in: kubernetes-client/python#366

And it works well

$ python list_runtime.py
Runtime: mpi-distributed
Runtime: torch-distributed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich Sorry for the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste Yes, please let's use object there.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Electronic-Waste!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, astefanutti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit a6b4840 into kubeflow:master Mar 2, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants