Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Update Beam containers to numpy 2.x #33639

Open
1 of 17 tasks
tvalentyn opened this issue Jan 17, 2025 · 3 comments · May be fixed by #33658
Open
1 of 17 tasks

[Bug]: Update Beam containers to numpy 2.x #33639

tvalentyn opened this issue Jan 17, 2025 · 3 comments · May be fixed by #33658
Assignees

Comments

@tvalentyn
Copy link
Contributor

tvalentyn commented Jan 17, 2025

What happened?

Currently, users who pip install apache-beam[gcp] install a newer version of numpy at job submission than is installed in container images. This causes a misconfiguration for Dataframe api users. From: https://lists.apache.org/thread/3k3rpnoh1tjf7d9rhvl88lmrn04fr9cn.

One of the jobs I ran (Java multi-lang that uses Python Dataframe) failed with the following error.

ModuleNotFoundError: No module named 'numpy._core.numeric'

Indeed in

we have numpy 1.x. We should try to upgrade the containers to use numpy 2.x for Python versions that support it. We should investigate what dependency is preventing Python 3.10+ containers from picking up numpy 2.x

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor Author

Currently numpy is downgraded due to the following:

pandas==2.1.4
├── numpy [required: >=1.22.4,<2, installed: 1.26.4]

@tvalentyn
Copy link
Contributor Author

We should support a newer version of Pandas.

@liferoad liferoad self-assigned this Jan 17, 2025
@liferoad
Copy link
Contributor

Couple of related PRs: #33627 #33638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants