Multi-thread issue on Windows 10 home or libuv?

Hello, has anyone experienced multi-thread issue on Windows?

Just installed Julia on brand new laptop, Windows 10 Home, and running a test script where JULIA_NUM_THREADS=1 works fine, but the same after setting JULIA_NUM_THREADS=4 results in the following error:

signal (22): SIGABRT
in expression starting at C:\Evovest\EvoTrees.jl\experiments\random_test.jl:36
crt_sig_handler at /cygdrive/d/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\Windows\System32\msvcrt.dll (unknown line)
abort at C:\Windows\System32\msvcrt.dll (unknown line)
assert at C:\Windows\System32\msvcrt.dll (unknown line)
uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105
uv_run at /workspace/srcdir/libuv\src/win\core.c:371
jl_process_events at /cygdrive/d/buildbot/worker/package_win64/build/src\jl_uv.c:214
jl_task_get_next at /cygdrive/d/buildbot/worker/package_win64/build/src\partr.c:520
poptask at .\task.jl:704
wait at .\task.jl:712 [inlined]
task_done_hook at .\task.jl:442
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_finish_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:198
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:717
Allocations: 28331201 (Pool: 28324093; Big: 7108); GC: 19

The script in question is the test for the package: https://github.com/Evovest/EvoTrees.jl/blob/master/test/core.jl
which works fine on 2 other Windows 10 Pro machine, Ubuntu servers plus the Travis CI of the package.

There’s the following line that catched my attention as it related to libuv which is tied to multi-threading to my understanding:

uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105

I’ve seem somewhere and issue about clock synchronization. I did tried the clock sync trick, but without success. I’m left with the idea that the issue may be related to Windows Pro vs Home, but I doubt it’s a reasonable scenario.

Here is a smaller reproducible example showing the bug:

using Statistics
using Base.Threads: @threads

X = rand(Int(1.25e6), 100)

function get_edges(X::AbstractMatrix{T}, nbins=250) where {T}
    edges = Vector{Vector{T}}(undef, size(X,2))
    @threads for i in 1:size(X, 2)
    # for i in 1:size(X, 2)
        edges[i] = quantile(view(X, :,i), (1:nbins)/nbins)
        if length(edges[i]) == 0
            edges[i] = [minimum(view(X, :,i))]
        end
    end
    return edges
end

println("num threads: ", Threads.nthreads())

println("trial 1: ")
edges = get_edges(X, 128);

println("trial 2: ")
edges = get_edges(X, 128);

println("trial 3: ")
edges = get_edges(X, 128);

println("trial 4: ")
edges = get_edges(X, 128);

Script is then run from CLI, with julia num threads set to 4:

C:\Evovest\EvoTrees.jl\experiments>julia thread_bug.jl

num threads: 4
trial 1:
trial 2:
Assertion failed: new_time >= loop->time, file src/win/core.c, line 105

signal (22): SIGABRT
in expression starting at C:\Evovest\EvoTrees.jl\experiments\thread_bug.jl:23
crt_sig_handler at /cygdrive/d/buildbot/worker/package_win64/build/src\signals-win.c:92
raise at C:\WINDOWS\System32\msvcrt.dll (unknown line)
abort at C:\WINDOWS\System32\msvcrt.dll (unknown line)
assert at C:\WINDOWS\System32\msvcrt.dll (unknown line)
uv_update_time at /workspace/srcdir/libuv\src/win\core.c:105
uv_run at /workspace/srcdir/libuv\src/win\core.c:371
jl_process_events at /cygdrive/d/buildbot/worker/package_win64/build/src\jl_uv.c:214
jl_task_get_next at /cygdrive/d/buildbot/worker/package_win64/build/src\partr.c:520
poptask at .\task.jl:704
wait at .\task.jl:712 [inlined]
task_done_hook at .\task.jl:442
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_finish_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:198
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:717
Allocations: 1298065 (Pool: 1297660; Big: 405); GC: 3

If the @threads is removed from the loop, then there’s no crash to report.
With @threads, it crashes non deterministically, after 2, 3 or more iterations.

This issue on libuv seems to be tied: https://github.com/libuv/libuv/issues/1633, as well as this associated patch: https://github.com/libuv/libuv/commit/796744869669842bd5405a71de8ba60b1556fc24.
Is Julia using a Windows system libuv or should it be using its own? Ie: is it more likely that I should consider a Windows resintall or it may be about the libuv shipped with Julia (if any)?

From the libuv discussion it seems that it is an icelake problem, can you do a

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, haswell)
Environment:
  JULIA_PKGDIR = c:\Program Files\Julia-0.6.2\packages\
  JULIA_SHELL = C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe

so, the experts can check?

It seems to be difficult to reproduce this issue, I will try later on a Windows Home version, on my Pro systems there is no issue.

Julia ships its own version of libuv. I would wait with any extrem action like reinstalling windows. It’s not clear if this helps at all. There have been some solutions in the thread regarding resetting system time on the board, removing board battery and similar tips regarding BIOS changes and updates. Those are more reasonable to test first before reinstalling windows.

Thanks for reply. Regarding cpu, my laptop isn’t Icelake, but Ryzen. Here are the details:

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 7 4800HS with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
  JULIA_NUM_THREADS = 16

Being a fresh out of the box laptop, it isn’t obvious to me whether the motherboard would be accessible, and whether it could be a issue for the warranty. I agree that the Windows update is a bit of a Hail Mary :slight_smile: I tried both Julia 1.5.2 and 1.4.2.

Are you aware of an application that uses libuv that could be easy to run? That could help clarify whether the issue is only with Julia’s libuv or system wide.

A BIOS update would always be possible, but again, it is not clear, if this helps and should only be done, if you feel safe enough to do this.

No, perhaps someone else can answer this. Lets wait.

This may sound like madness, but after I start the laptop following a shutdown, the bug seems to never happen. I can run the above trials blocks for 10+ times, no issue. But if I do a restart, it will crash within the first 3 to 5 calls to that script.

What could go wrong when doing restart on Windows vs a shutdown? My understanding is that a restart should be a clean operation as it clears everything, while shutdown save some state for quick reboot, so it’s really puzzling me that a restart introduces issues.

The difference maybe the BIOS here, as a Windows restart doesn’t restart the BIOS (AFAIK).