Got a new MacBook Pro 16" M1 Max with 64GB mem and installed both the 1.7.2 ARM build and the 1.6.5 macOS (so non-ARM) build. The simple Threads.@threads example from the doc pages behaves as expected on the non-arm build but not on the ARM build.
Here is as expected when running on the 1.6.5 non-ARM build:
~ % julia6 -t 8
...
julia> Threads.nthreads()
8
julia> a = zeros(Int, 10);
julia> Threads.@threads for i in 1:length(a)
a[i] = Threads.threadid()
end
julia> a'
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 2 2 3 4 5 6 7 8
But when running on the 1.7.2 ARM build not all of the array gets any threadids:
~ % julia7 -t 8
...
julia> Threads.nthreads()
8
julia> a = zeros(Int, 10);
julia> Threads.@threads for i in 1:length(a)
a[i] = Threads.threadid()
end
julia> a'
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 0 0 0 0 0 0 0 0
I tried starting with different number of threads but makes no difference. There are always some of the array elements (at the tail end) that have the value zero. I’ll install and check also the non-ARM 1.7.2 build next.
Anyone else seen similar problems?
you should try on 1.8 beta. 1.7.x arm builds are very experimental.
1 Like
Julia Version 1.8.0-DEV.1434
Commit 4abf26eec8 (2022-01-30 20:04 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.2.0)
CPU: Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.0 (ORCJIT, cyclone)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
mumuse4.jl
@show Threads.nthreads()
a = zeros(Int, 10);
Threads.@threads for i in 1:length(a)
a[i] = Threads.threadid()
end
a'
julia> include("mumuse4.jl")
Threads.nthreads() = 8
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 2 2 3 4 5 6 7 8
1 Like
Thanks Oscar and Laurent. Better on 1.8.0-beta1 but only uses thread 1:
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.2.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_COPY_STACKS = 1
JULIA_NUM_THREADS = 8
Still not what I would expect:
julia> include("mumuse4.jl")
Threads.nthreads() = 8
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 1 1 1 1 1 1 1 1
What is even worse is that the buggy behaviour is also seen on 1.7.2 the x86 build:
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.5.0)
CPU: Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, westmere)
Environment:
JULIA_COPY_STACKS = 1
JULIA_NUM_THREADS = 8
tail end values still without threadids:
julia> include("mumuse4.jl")
Threads.nthreads() = 8
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 0 0 0 0 0 0 0 0
This might be a weird version of https://github.com/JuliaLang/julia/issues/41820 .
I can’t reproduce the behaviour you see with 1 1 1 1
as the output.
julia> a = zeros(Int, 10);
julia> Threads.@threads for i in 1:length(a)
a[i] = Threads.threadid()
end
julia> a'
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 1 2 2 2 3 3 4 4
julia> versioninfo()
Julia Version 1.9.0-DEV.165
Commit 8076517c97* (2022-03-09 19:46 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 4 on 4 virtual cores
Environment:
JULIA_NUM_PRECOMPILE_TASKS = 4
JULIA_NUM_THREADS = 4
Thanks @gbaraldi , I’ll check that thread. For now I also built and tested on latest Julia master branch but with the same strange/weird/buggy behaviour:
julia> versioninfo()
Julia Version 1.9.0-DEV.174
Commit 258ddc07d4 (2022-03-12 08:01 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_COPY_STACKS = 1
JULIA_NUM_THREADS = 8
julia> include("mumuse4.jl")
Threads.nthreads() = 8
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 1 1 1 1 1 1 1 1
tkf
March 12, 2022, 10:41am
8
try @threads :static for ...
I tried also with :static but makes no difference.
To summarise when running the above code (with or without :static) on my M1 Max MacBook Pro (latest macOS) I get:
Expected behaviour on: Julia 1.6.5 x86
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 2 2 3 4 5 6 7 8
Buggy behaviour on Julia 1.7.2 x86 and Julia 1.7.2 arm:
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 0 0 0 0 0 0 0 0
“Non-buggy” but only using threadid 1 on Julia 1.8.0-beta1 and on Julia 1.9.0-DEV.174:
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 1 1 1 1 1 1 1 1
The behaviour is the same if started with 2, 4, or 8 threads (with julia -t
).
It is strange that I could not reproduce your results on my M1 Max. I will try to find some time to change the Julia version latter in the WE.
Adding these results from running Base.runtests(["threads"]; ncores = 8)
on different versions since it might be relevant:
No errors on Julia 1.6.5 x86
Freezes on Julia 1.7.2 x86
Co-schedule error on Julia 1.7.2 arm and 1.8.0-beta1 and 1.9.0-DEV.174
The Co-schedule error is this (well line numbers differ due to test code changes, most likely), the same on 1.7.2 arm, 1.8.0-beta1 arm, and on 1.9.0-DEV.174 arm:
Test Failed at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/test/threads_exec.jl:848
Expression: (current_task()).sticky == true
Evaluated: false == true
Co-schedule: Error During Test at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/test/threads_exec.jl:844
Still unable to reproduce your bug on the latest master… I have no idea why.
Julia Version 1.9.0-DEV.174
Commit 258ddc07d4 (2022-03-12 08:01 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
julia> include("mumuse4.jl")
Threads.nthreads() = 8
a' = [2 2 1 1 5 3 7 4 8 6]
1×10 adjoint(::Vector{Int64}) with eltype Int64:
2 2 1 1 5 3 7 4 8 6
julia> include("mumuse4.jl")
Threads.nthreads() = 8
a' = [1 1 2 2 3 5 7 4 8 6]
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 2 2 3 5 7 4 8 6
julia> include("mumuse4.jl")
Threads.nthreads() = 8
a' = [2 2 1 1 3 7 5 4 8 6]
1×10 adjoint(::Vector{Int64}) with eltype Int64:
2 2 1 1 3 7 5 4 8 6
Elrod
March 12, 2022, 6:52pm
13
I can reproduce by setting the ENV variable JULIA_COPY_STACKS=1
.
If I do not set it, I get the expected behavior.
This is on a regular (4/4) M1.
I can also reproduce this on an Intel CPU.
3 Likes
Thanks Chris.
That seems to be it. If I set JULIA_COPY_STACKS=0
behaviour is as expected:
julia> include("mumuse4_static.jl")
Julia Version 1.9.0-DEV.174
Commit 258ddc07d4 (2022-03-12 08:01 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.3.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Environment:
JULIA_COPY_STACKS = 0
JULIA_NUM_THREADS = 8
versioninfo() = nothing
Threads.nthreads() = 8
1×10 adjoint(::Vector{Int64}) with eltype Int64:
1 1 2 2 3 4 5 6 7 8
Correct behaviour also on 1.8.0-beta1 arm, 1.7.2 arm, and 1.7.2 x86.
Seems the docs might need to mention this.
Anyway, thanks to you all that got involved.
I checked my .bashrc
and I had originally set JULIA_COPY_STACKS=1
for the Taro.jl
package. I don’t remember why but now deleted. Anyway, thanks again.
Elrod
March 12, 2022, 7:17pm
17
Taro.jl relies on JavaCall.jl, which requires JULIA_COPY_STACKS=1.
EDIT:
I’ve filed an issue: https://github.com/JuliaLang/julia/issues/44589
1 Like
Well there it is then. JavaCall.jl
still requests people to set JULIA_COPY_STACKS
to 1 which will not work on M1 Macs then, I guess…
https://github.com/JuliaInterop/JavaCall.jl
Elrod
March 12, 2022, 7:25pm
19
It does not work on Intel Linux either.
EDIT: Nor does it work on AMD Linux, not that I was expecting it to. Pretty sure it’s just broken for all platforms.
2 Likes
Ok wow, good we found it then.