Crash when tuning SVC with multithreading in MLJ.jl

I have been trying to tune a set of hyper-parameters of a SVC. The SVC is from LIBSVM.jl. It works when I use only one thread, but crashes indeterministically with multiple threads.

Not only did it may or may not crash, the messages after crash are different.

Below is a MWE that demostrates the crash and the different crash messages.

# filename: testcrash.jl
using MLJ

SVC = @load SVC pkg=LIBSVM verbosity=0
X = MLJ.table(rand(256, 1000)')
y = categorical(rand(1:12, 1000))

svc = SVC()
tuned_model = TunedModel(
    model=svc,
    tuning=Grid(resolution=100),
    range=range(svc, :cost, lower=1e-2, upper=1e10, scale=:log),
    measure=accuracy,
    acceleration=CPUThreads(),
    acceleration_resampling=CPUThreads(),
)
mach = machine(tuned_model, X, y)
fit!(mach, verbosity=0)
Crash message 1
$ julia --banner=no --project=. -t 8 testcrash.jl

[93738] signal 11 (2): Segmentation fault: 11
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
julia(93738,0x16da1f000) malloc: *** error for object 0x13: pointer being freed was not allocated
julia(93738,0x16da1f000) malloc: *** set a breakpoint in malloc_error_break to debug

[93738] signal 6: Abort trap: 6
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 55132819 (Pool: 55129833; Big: 2986); GC: 74ifacts/1db1a4e6cb067e9d3da16d6b928200e388e6f969/lib/libsvm.dylib (unknown line)
Allocations: 55132819 (Pool: 55129833; Big: 2986); GC: 74
[1]    93738 abort      julia --banner=no --project=. -t 8 testcrash.jl
Crash message 2
$ julia --banner=no --project=. -t 8 testcrash.jl
OMP: Error #13: Assertion failure at kmp_csupport.cpp(607).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://github.com/llvm/llvm-project/issues/.

[93754] signal 6: Abort trap: 6
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 55053336 (Pool: 55050435; Big: 2901); GC: 74
[1]    93754 abort      julia --banner=no --project=. -t 8 testcrash.jl
Crash message 3
$ julia --banner=no --project=. -t 8 testcrash.jl

[93856] signal 11 (2): Segmentation fault: 11
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
[1]    93856 segmentation fault  julia --banner=no --project=. -t 8 testcrash.jl
Crash message 4
julia --banner=no --project=. -t 8 testcrash.jl
julia(93886,0x16eaab000) malloc: Corruption at 0x143e2b9e0: unexpected msizes 0/18
julia(93886,0x16eaab000) malloc: *** set a breakpoint in malloc_error_break to debug

[93886] signal 6: Abort trap: 6
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
^C
[93886] signal 2: Interrupt: 2
in expression starting at /Users/yuanrulin/multithreads_crash/testcrash.jl:17
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__ulock_wait2 at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__ulock_wait2 at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__ulock_wait2 at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
^C
versioninfo()

Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M2 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 6 default, 0 interactive, 3 GC (on 6 virtual cores)
Environment:
JULIA_PROJECT = @work

@ablaom

1 Like

Thanks @Yuan-Ru-Lin for documenting your issue so carefully.

According to this issue LIBSVM and, some (but not all) other non-Julia models do not play well with certain kinds of model composition when using multithreading. And the errors you report do look at least superficially similar to those reported there.

I guess the management of processes by LIBSVM (binary code compiled from C) collides with what Julia is trying to do, but I don’t pretend to understand this kind of thing very well. I’m doubtful this has a simple resolution and the best I can offer is to drop multithreaded tuning for this particular model.

It may be you can get away with multithreading at the top level (acceleration, but not acceleration_resampling) but I’m guessing you tried that already.

Sorry I can’t be more helpful.

2 Likes

Update: Okay, it appears LIBSVM is not thread-safe, so the issue lies there and not with MLJ: Can we make the package thread-safe? · Issue #60 · JuliaML/LIBSVM.jl · GitHub

1 Like

Thanks for the prompt and detailed reply. It’s a miracle but commenting out the line of acceleration_resampling (while keeping acceleration) solves the crash.

3 Likes