Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage Collection Segfaults on 1.11 from Enzyme + SymbolicRegression #56735

Open
MilesCranmer opened this issue Dec 2, 2024 · 9 comments
Open
Labels
GC Garbage collector multithreading Base.Threads and related functionality regression 1.11 Regression in the 1.11 release

Comments

@MilesCranmer
Copy link
Member

MilesCranmer commented Dec 2, 2024

With the following environment:

using Pkg

Pkg.activate(temp=true)
Pkg.add([
    PackageSpec(name="SymbolicRegression", version=v"1.0.3"),
    PackageSpec(name="Enzyme", rev="3ad827f69299299b92a1448f52dd746a65eb5db7"),
    PackageSpec(name="MLJBase", version=v"1.7.0"),
])

The following code segfaults on 1.11.1. However, it works fine on 1.10.7.

using SymbolicRegression, Enzyme, MLJBase

X = randn(Float64, 32, 2)
y = randn(Float64, 32)

model = SRRegressor(;
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    autodiff_backend=:Enzyme,
    niterations=1000, # Larger because the segfault is stochastic
);

mach = machine(model, X, y)
fit!(mach)

The error message on 1.11.1 is:

[98721] signal 11 (2): Segmentation fault: 11 in expression starting at REPL[23]
gc_mark_obj8 at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-11/src/gc.c:0 [inlined]
gc_mark_outrefs at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-11/src/gc.c:2888 [inlined]
gc_mark_and_steal at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-11/src/gc.c:2993
gc_mark_loop_parallel at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-11/src/gc.c:3141
jl_parallel_gc_threadfun at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-R17H3W25T9.0/build/default-honeycrisp-R17H3W25T9-0/julialang/julia-release-1-dot-11/src/scheduler.c:151e-01
_pthread_start at /usr/lib/system/libsystem_pthread.dylib (unknown line)
Allocations: 323517950 (Pool: 323091059; Big: 426891); GC: 411

There is no error on 1.10.7.

Let me know what other debug info I can provide on this.

cc @wsmoses @vchuravy

x-ref EnzymeAD/Enzyme.jl#2081

@MilesCranmer MilesCranmer changed the title Garbage Collection Segfaults on 1.11 Garbage Collection Segfaults from Enzyme + SymbolicRegression on 1.11 Dec 2, 2024
@MilesCranmer MilesCranmer changed the title Garbage Collection Segfaults from Enzyme + SymbolicRegression on 1.11 Garbage Collection Segfaults on 1.11 from Enzyme + SymbolicRegression Dec 2, 2024
@MilesCranmer
Copy link
Member Author

MilesCranmer commented Dec 2, 2024

When I run Julia 1.11 with --threads=auto --gcthreads=1, the segfault disappears. So might it be a race condition in the garbage collector?

@ViralBShah ViralBShah added GC Garbage collector multithreading Base.Threads and related functionality regression 1.11 Regression in the 1.11 release labels Dec 2, 2024
@vchuravy
Copy link
Member

vchuravy commented Dec 5, 2024

How reliably does this reproduce for you? What machine are you running this one?

@vchuravy
Copy link
Member

vchuravy commented Dec 5, 2024

This does also segfault for me with a single thread.

vchuravy@odin ~/s/s/julia_56735> ~/src/julia2/julia --project=. repr.jl 
[ Info: Training machine(SRRegressor(defaults = nothing, …), …).
┌ Warning: You are using multithreading mode, but only one thread is available. Try starting julia with `--threads=auto`.
└ @ SymbolicRegression ~/.julia/packages/SymbolicRegression/44X04/src/Configure.jl:59
[ Info: Started!

[1856060] signal 11 (1): Segmentation fault
in expression starting at /home/vchuravy/src/snippets/julia_56735/repr.jl:14
fish: Job 1, '~/src/julia2/julia --project=. …' terminated by signal SIGSEGV (Address boundary error)

@MilesCranmer
Copy link
Member Author

You need to use --gcthreads=1 for the segfault to go away. The number of threads for Julia itself doesn't affect it

@vchuravy
Copy link
Member

vchuravy commented Dec 6, 2024

Please always post with versioninfo(;verbose=true)

No, you misunderstood. Without using any threads, your code eventually segmentation faults, for me.

So now it's difficult to tell if you are doing something fishy, Enzyme is doing something fishy, or if there is a bug in Julia.
Is Bumper and LoopVectorization turned off by default?

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Dec 6, 2024

Bumper and LoopVectorization are indeed disabled by default. (Their code paths are completely compiled away; Enzyme never interacts with them)

I can’t reproduce that segfault with --gcthreads=1. I only see it for more than one GC thread.

I’m on macOS, aarch64. I saw the GC segfault with 1.11.1 and now also 1.11.2.

@vchuravy
Copy link
Member

vchuravy commented Dec 6, 2024

I can’t reproduce that segfault with

Did you try setting --threads=1?

@MilesCranmer
Copy link
Member Author

Both --threads =1 or =auto has no effect on the segfault for me, it’s just the gcthreads.

@gbaraldi
Copy link
Member

Do we know if this isn't an enzyme specific issue? I don't think Enzyme.jl is fully working on 1.11 yet, me and specially @wsmoses have been slogging through some things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector multithreading Base.Threads and related functionality regression 1.11 Regression in the 1.11 release
Projects
None yet
Development

No branches or pull requests

4 participants