Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Julia upgrade may be surprising #39

Open
amilsted opened this issue Oct 17, 2024 · 19 comments
Open

Automatic Julia upgrade may be surprising #39

amilsted opened this issue Oct 17, 2024 · 19 comments

Comments

@amilsted
Copy link
Contributor

amilsted commented Oct 17, 2024

It's surprising to me that juliapkg will automatically pick the latest compatible Julia version from juliaup's list, even if it isn't installed, while a previous compatible version is available.

I'd prefer to default to using an installed version if it meets the Julia compat constraints. Partly this is because I want to have a way to keep a "recommended" version of Julia even if a newer version is available (see also #29).

Also, Julia Pkg now has the PRESERVE_TIERED_INSTALLED resolver mode that prefers to use compatible, already-installed versions of packages even if the registry has newer releases. It would be nice if juliapkg had a corresponding mode for Julia installation.

Could maybe make this behavior configurable?

@MilesCranmer
Copy link

MilesCranmer commented Dec 4, 2024

+1

Yeah I think #29 would also help because we could fix a single recommended version. That way a new Julia release wouldn't instantly brick packages upon simply importing a Python package.

@toolslive
Copy link

you don't wanna know how much money this cost us. A new julia release was downloaded automagically on all nodes individually in a distributed cluster because a new julia was released.

@cjdoris
Copy link
Collaborator

cjdoris commented Jan 15, 2025

Agreed that a setting to prefer existing installations would be good. In the meantime you can set the environment variable PYTHON_JULIAPKG_EXE to a specific Julia if you want to avoid ever installing it.

@toolslive
Copy link

well, we had this in place:

   from juliacall import Pkg as jlPkg
   from juliacall import Main as jl
    ...
    jPkg.offline(True)
    jlPkg.activate(project)

so we thought we were safe ...
However, the jPkg.offline(True) comes too late, as the damage is done as a side effect of the import of juliacall (see juliacall/init.py which does init() )

So the only thing that works is going via the ENV vars.

@amilsted
Copy link
Contributor Author

You can also import juliapkg before juliacall in order to change to offline status programmatically.

@amilsted
Copy link
Contributor Author

Agreed that a setting to prefer existing installations would be good.

Could we maybe make the default behavior prefer existing installations?

@toolslive
Copy link

Import side effects are really not ok. Just don't

@amilsted
Copy link
Contributor Author

The entire julia and julia-dependency installation in juliapkg is an import side effect... Is this what you mean?

@MilesCranmer
Copy link

At some point I feel like we should try to switch to doing the install at pip install time: #35 #16

Though it is indeed quite tricky due to the dynamic nature of Julia environments

@MilesCranmer
Copy link

MilesCranmer commented Jan 16, 2025

Part of the issue is that for most Python backends, the dependencies can be entirely independent. JAX has its own compiled backend. PyTorch has its own compiled backend. While they aren’t compatible with eachother (can’t do jax.numpy.sin on a torch tensor), they can be installed without knowing about each other’s existence.

Whereas for Julia backends of Python libraries, all the backend libraries can actually talk to eachother and pass objects back and forth. Different backends all get to sit in the same combined environment and compile methods from one on objects of the other - which is great for compatibility across tools. But I think this is why it’s also way easier to do this environment config at import time. And why it is unusual compared to traditional Python backends.

I mean ideally I think it would be nice to set this up automatically. I think it’s just not as straightforward as one might expect though, because of this dynamicism and cross-compatibility of Julia backends for Python.

@amilsted
Copy link
Contributor Author

amilsted commented Jan 16, 2025

Yeah... juliapkg is trying to be the interface to Julia's package manager for all Julia packages required by python packages in the current virtual env. An alternative is to make the resolution process manually triggered - you run a command like python -m juliapkg resolve after your pip install or poetry install. This would have to harvest julia deps from the whole python virtual env somehow... which I guess might be easier if they were in pyproject.toml?

@toolslive
Copy link

toolslive commented Jan 17, 2025

It can get worse:

  • I've seen the julia part, not knowing it was running inside a python process starting to download and install a python interpreter to be able to call python code from julia.
  • the python process could be forked... (celery workers, gunicorn, ....) then you have a download/install/compile going on per process.

Also, it's manipulating the (project's) Project.toml & Manifest.toml files. It should not do this, it should just listen to them. They (should) have authority.

@MilesCranmer
Copy link

Also, it's manipulating the Project.toml & Manifest.toml files. It should not do this, it should just listen to them. They (should) have authority

Manipulating the managed Project.toml and Manifest.toml is how juliapkg works. There’s no other way to install stuff in Julia other than manipulating those. But it owns and manages those files so this shouldn’t be unexpected. I suppose what is unexpected was these being changed at runtime, but this is kinda needed at the moment just due to the dynamic and shared nature of Julia environments - see explanation above. But yeah it’d be nice to have this stuff happen at pip install time if it’s even possible.

(If you mean an externally non-managed project, this shouldn’t happen. So please submit a bug report if this does.)

But then again I suppose if the Manifest.toml is already compatible with juliapkg’s requirements, then it shouldn’t be updated. So perhaps that extra logic should be added, to prevent unintended updates.

the python process could be forked... (celery workers, gunicorn, ....) then you have a download/install/compile going on per process.

So if it’s a shared filesystem, then the compile will only happen in one process because Julia will use a file locking mechanism to prevent simultaneous precompilation. But the download of Julia itself I suppose is not locked since juliapkg manages this. So we should probably put in a patch for that.


P.S., as a workaround to prevent these unintended updates, you should be freezing the version with:

import juliapkg
juliapkg.require_julia("=1.11.2")

If you write "1.11.2" by itself, this is actually setting the minimum version, rather than fixing it.

Similar for other dependencies.

@toolslive
Copy link

You can also import juliapkg before juliacall in order to change to offline status programmatically.

if import order matters you're doing something wrong. really.

@MilesCranmer
Copy link

@toolslive See #39 (comment)

@MilesCranmer
Copy link

I mean this isn't even that uncommon a pattern

import matplotlib
matplotlib.use('Agg') # Change backend
import matplotlib.pyplot as plt

Lots of libraries involve manipulating the backend settings before loading the package. Another example:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import tensorflow as tf

@toolslive
Copy link

Sigh. Even though the side effect of updating a global dict is probably suspect, It's limited and predictable. Compare this to downloading a kitchen sink and writing all over the file system. Anyway, we contained this behaviour via ENV vars..

(
However, some of our celery workers, which are running in docker containers, and calling julia, crash with a SEGV.
I have my suspicions, but I still need to get my hands on a core dump before I assign blame. SNAFU
)

@MilesCranmer
Copy link

Ok so these are a few different things:

  1. Import order - As I explained, import order having some effect on behavior is kind of unavoidable in Python (unless written in pure Python - but basically nothing is). Lots of other libraries have this behavior too. Importing juliapkg before juliacall to configure it is not ideal but there's not really any other way.
  2. Downloading/updating when you don't need to - This is the core issue in this thread about preferring already-installed versions. I feel like this could be easily fixable(?). But this is also separate from when we do the install.
  3. Install-time vs import-time - Moving everything to pip install time is a much bigger architectural change (discover dependencies in pyproject.toml files #35/respect pyproject.toml #16), and is much harder in general. This needs to be thought about separately from the other issues. (Any help is always appreciated.)

Regarding "writing all over the file system," it really only writes to:

  1. The Julia installation directory
  2. The local virtual env

Which are both required and [I would assume] expected. It's just when it installs that seems unexpected. Right?

@amilsted
Copy link
Contributor Author

amilsted commented Jan 18, 2025

@toolslive It sounds like you might just not want what juliapkg provides at all. I sympathize, because I am also working on a python package with Julia dependencies in which we basically just bypass it - as you probably know, you can set certain juliacall environment variables to do this. Actually, we do use some of juliapkg. We use internal APIs to have it locate or install Julia (but not upgrade it if present!) and then switch it to offline mode completely. We tell juliacall the Julia location via the appropriate environment variable.

We manage Julia package dependencies by supplying a fixed Project.toml and Manifest.toml, which we ship with the package (to avoid surprises with upstream dependency changes we haven't tested against - every now and again some change in SciML breaks something, at least temporarily!).

The problem with this approach is that it rules out anybody using a different python package that uses juliacall and juliapkg together with ours. Fortunately, there aren't too many of these yet!

If julia package management were integrated with python package management, we could proably avoid a lot of these shenanigans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants