Hello, has anyone experienced multi-thread issue on Windows?
Just installed Julia on brand new laptop, Windows 10 Home, and running a test script where JULIA_NUM_THREADS=1 works fine, but the same after setting JULIA_NUM_THREADS=4 results in the following error:
signal (22): SIGABRT
in expression starting at C:\Evovest\EvoTrees.jl\experiments\random_test.jl:36
crt_sig_handler at /cygdrive/d/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105
uv_run at /workspace/srcdir/libuv\src/win\core.c:371
jl_process_events at /cygdrive/d/buildbot/worker/package_win64/build/src\jl_uv.c:214
jl_task_get_next at /cygdrive/d/buildbot/worker/package_win64/build/src\partr.c:520
poptask at .\task.jl:704
wait at .\task.jl:712 [inlined]
task_done_hook at .\task.jl:442
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_finish_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:198
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:717
Allocations: 28331201 (Pool: 28324093; Big: 7108); GC: 19
There’s the following line that catched my attention as it related to libuv which is tied to multi-threading to my understanding:
uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105
I’ve seem somewhere and issue about clock synchronization. I did tried the clock sync trick, but without success. I’m left with the idea that the issue may be related to Windows Pro vs Home, but I doubt it’s a reasonable scenario.
Here is a smaller reproducible example showing the bug:
using Statistics
using Base.Threads: @threads
X = rand(Int(1.25e6), 100)
function get_edges(X::AbstractMatrix{T}, nbins=250) where {T}
edges = Vector{Vector{T}}(undef, size(X,2))
@threads for i in 1:size(X, 2)
# for i in 1:size(X, 2)
edges[i] = quantile(view(X, :,i), (1:nbins)/nbins)
if length(edges[i]) == 0
edges[i] = [minimum(view(X, :,i))]
end
end
return edges
end
println("num threads: ", Threads.nthreads())
println("trial 1: ")
edges = get_edges(X, 128);
println("trial 2: ")
edges = get_edges(X, 128);
println("trial 3: ")
edges = get_edges(X, 128);
println("trial 4: ")
edges = get_edges(X, 128);
Script is then run from CLI, with julia num threads set to 4:
C:\Evovest\EvoTrees.jl\experiments>julia thread_bug.jl
num threads: 4
trial 1:
trial 2:
Assertion failed: new_time >= loop->time, file src/win/core.c, line 105
signal (22): SIGABRT
in expression starting at C:\Evovest\EvoTrees.jl\experiments\thread_bug.jl:23
crt_sig_handler at /cygdrive/d/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\WINDOWS\System32\msvcrt.dll (unknown line)
abort at C:\WINDOWS\System32\msvcrt.dll (unknown line)
assert at C:\WINDOWS\System32\msvcrt.dll (unknown line)
uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105
uv_run at /workspace/srcdir/libuv\src/win\core.c:371
jl_process_events at /cygdrive/d/buildbot/worker/package_win64/build/src\jl_uv.c:214
jl_task_get_next at /cygdrive/d/buildbot/worker/package_win64/build/src\partr.c:520
poptask at .\task.jl:704
wait at .\task.jl:712 [inlined]
task_done_hook at .\task.jl:442
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_finish_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:198
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:717
Allocations: 1298065 (Pool: 1297660; Big: 405); GC: 3
If the @threads is removed from the loop, then there’s no crash to report.
With @threads, it crashes non deterministically, after 2, 3 or more iterations.
From the libuv discussion it seems that it is an icelake problem, can you do a
julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
Environment:
JULIA_PKGDIR = c:\Program Files\Julia-0.6.2\packages\
JULIA_SHELL = C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
so, the experts can check?
It seems to be difficult to reproduce this issue, I will try later on a Windows Home version, on my Pro systems there is no issue.
Julia ships its own version of libuv. I would wait with any extrem action like reinstalling windows. It’s not clear if this helps at all. There have been some solutions in the thread regarding resetting system time on the board, removing board battery and similar tips regarding BIOS changes and updates. Those are more reasonable to test first before reinstalling windows.
Thanks for reply. Regarding cpu, my laptop isn’t Icelake, but Ryzen. Here are the details:
julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: AMD Ryzen 7 4800HS with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
JULIA_NUM_THREADS = 16
Being a fresh out of the box laptop, it isn’t obvious to me whether the motherboard would be accessible, and whether it could be a issue for the warranty. I agree that the Windows update is a bit of a Hail Mary I tried both Julia 1.5.2 and 1.4.2.
Are you aware of an application that uses libuv that could be easy to run? That could help clarify whether the issue is only with Julia’s libuv or system wide.
This may sound like madness, but after I start the laptop following a shutdown, the bug seems to never happen. I can run the above trials blocks for 10+ times, no issue. But if I do a restart, it will crash within the first 3 to 5 calls to that script.
What could go wrong when doing restart on Windows vs a shutdown? My understanding is that a restart should be a clean operation as it clears everything, while shutdown save some state for quick reboot, so it’s really puzzling me that a restart introduces issues.