Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update codeflare-common dependency #639

Closed
wants to merge 1 commit into from

Conversation

dgrove-oss
Copy link
Collaborator

Updating the codeflare-common dependency fixes this build problem:

(base) dgrove@Dave's IBM Mac codeflare-operator % SED=gsed make modules 
go get github.com/ray-project/kuberay/[email protected]
go: downgraded github.com/project-codeflare/appwrapper v0.27.0 => v0.14.2
go: downgraded github.com/ray-project/kuberay/ray-operator v1.2.1 => v1.1.0
go: downgraded sigs.k8s.io/kueue v0.8.3 => v0.7.0-rc.1
go get sigs.k8s.io/[email protected]
go: upgraded github.com/ray-project/kuberay/ray-operator v1.1.0 => v1.2.1
go: upgraded sigs.k8s.io/kueue v0.7.0-rc.1 => v0.8.3
go get github.com/project-codeflare/[email protected]
go: upgraded github.com/project-codeflare/appwrapper v0.14.2 => v0.27.0
go mod tidy
go: finding module for package k8s.io/apimachinery/pkg/runtime/json
go: github.com/project-codeflare/codeflare-operator/test/e2e imports
	github.com/project-codeflare/codeflare-common/support imports
	k8s.io/apimachinery/pkg/runtime/json: module k8s.io/apimachinery@latest found (v0.32.0), but does not contain package k8s.io/apimachinery/pkg/runtime/json
make: *** [modules] Error 1
(base) dgrove@Dave's IBM Mac codeflare-operator % more go.mod   
module github.com/project-codeflare/codeflare-operator

Copy link

openshift-ci bot commented Dec 13, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tedhtchang for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dgrove-oss
Copy link
Collaborator Author

The e2e test fail looks like a ray dependency problem. I am not going to debug. Leave to someone on codeflare team to triage and fix. The relevant log is:

Could not create the actor because its associated runtime env failed to be created.
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/agent/runtime_env_agent.py", line 381, in _create_runtime_env_with_retry
    runtime_env_context = await asyncio.wait_for(
                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/tasks.py", line 489, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/agent/runtime_env_agent.py", line 346, in _setup_runtime_env
    await create_for_plugin_if_needed(
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/plugin.py", line 254, in create_for_plugin_if_needed
    size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/pip.py", line 518, in create
    bytes = await task
            ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/pip.py", line 498, in _create_for_hash
    await PipProcessor(
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/pip.py", line 400, in _run
    await self._install_pip_packages(
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/pip.py", line 376, in _install_pip_packages
    await check_output_cmd(pip_install_cmd, logger=logger, cwd=cwd, env=pip_env)
  File "/opt/app-root/lib64/python3.11/site-packages/ray/_private/runtime_env/utils.py", line 105, in check_output_cmd
    raise SubprocessCalledProcessError(
ray._private.runtime_env.utils.SubprocessCalledProcessError: Run cmd[9] failed with the following details.
Command '['/tmp/ray/session_2024-12-13_19-57-16_218119_1/runtime_resources/pip/4c63697276479c6b28458e4db0df558b6a6070dc/virtualenv/bin/python', '-m', 'pip', 'install', '--disable-pip-version-check', '--no-cache-dir', '-r', '/tmp/ray/session_2024-12-13_19-57-16_218119_1/runtime_resources/pip/4c63697276479c6b28458e4db0df558b6a6070dc/ray_runtime_env_internal_pip_requirements.txt']' returned non-zero exit status 1.
Last 50 lines of stdout:
    Looking in indexes: https://pypi.python.org/simple
    Collecting pytorch_lightning==1.9.5 (from -r /tmp/ray/session_2024-12-13_19-57-16_218119_1/runtime_resources/pip/4c63697276479c6b28458e4db0df558b6a6070dc/ray_runtime_env_internal_pip_requirements.txt (line 1))
      Obtaining dependency information for pytorch_lightning==1.9.5 from https://files.pythonhosted.org/packages/77/ed/7d91e1958f0d48b439fae0de8ece3de3ce8c3d4e03b04bd3c007ba879a49/pytorch_lightning-1.9.5-py3-none-any.whl.metadata
      Downloading pytorch_lightning-1.9.5-py3-none-any.whl.metadata (23 kB)
    Collecting torchmetrics==0.9.1 (from -r /tmp/ray/session_2024-12-13_19-57-16_218119_1/runtime_resources/pip/4c63697276479c6b28458e4db0df558b6a6070dc/ray_runtime_env_internal_pip_requirements.txt (line 2))
      Obtaining dependency information for torchmetrics==0.9.1 from https://files.pythonhosted.org/packages/a6/52/1eb3b8fcf4e0d0f79cd8f637880f70e83b1cbb2543e74d89280295ed66ef/torchmetrics-0.9.1-py3-none-any.whl.metadata
      Downloading torchmetrics-0.9.1-py3-none-any.whl.metadata (20 kB)
    ERROR: Could not find a version that satisfies the requirement torchvision==0.12.0 (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.17.2, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.20.0, 0.20.1)
    ERROR: No matching distribution found for torchvision==0.12.0

@dgrove-oss
Copy link
Collaborator Author

It's possible that project-codeflare/codeflare-common#85 which aligns the version of kuberay between codeflare-common and codeflare-operator would help.

@sutaakar
Copy link
Contributor

@dgrove-oss I wasn't able to reproduce the build issue, using latest main branch the build pass
What Go lang version do you use? Mine is 1.22.4.

@sutaakar
Copy link
Contributor

I will check the e2e test issue, I suppose it is caused by Ray image upgrades. Thanks for pointing that out.

@dgrove-oss
Copy link
Collaborator Author

@dgrove-oss I wasn't able to reproduce the build issue, using latest main branch the build pass What Go lang version do you use? Mine is 1.22.4.

I have go 1.23.4 (needed to work with upstream kueue). From the message, it looks like it is related to the release of kubernetes 1.32 last week, but these go dependency things are sometimes mysterious. 🤷

@sutaakar
Copy link
Contributor

#641 should fix e2e tests on newest codeflare-common

@dgrove-oss
Copy link
Collaborator Author

closed in favor of #641

@dgrove-oss dgrove-oss closed this Dec 16, 2024
@dgrove-oss dgrove-oss deleted the cf-common-bump branch December 16, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants