Hello all - Iβm really confused about the following large performance hit, I believe due to type instability, in a really simple/standard piece of code. It took a while to boil it down to this MWE. It is triggered when an array is (seemingly trivially) allocated more than once, and then accessed in a @threads loop:
julia> function unstable()
x = zeros(0) # if comment out this line, becomes fast and type-stable
x = zeros(1_000_000)
Threads.@threads for i in eachindex(x)
x[i] = 1.0
end
x
end
unstable (generic function with 1 method)
julia> Threads.nthreads()
1
julia> @btime unstable();
17.816 ms (999502 allocations: 22.88 MiB)
Note that x
is unambiguously Float64
in every operation (this can be added explicitly but of course to no effect). Yet somehow the number of allocations is close to the iteration count. But if the commented line is removed (so x is not pre-allocated twice), one gets:
julia> @btime unstable();
1.382 ms (8 allocations: 7.63 MiB)
Note that I am running Julia with one thread, so itβs not a βtoo many threads problemβ. Nowβ¦
@code_warntype unstable();
includes Body::Any
in red, while the second version has the expected
Body::Vector{Float64}
.
In the first version it does not matter what size the first allocation is; it could be the same as the 2nd, but there have to be two of them (in my code the allocation was conditional on an input arg, but it seems that is not needed for a MWE).
Fixes include removing @threads
, or inserting a let x=x
block around the loop (I donβt understand this).
But I want to be able to have it make use of multithreading, and be reallocated based on a condition. This is a really simple piece of textbook code basically taken straight from the manual, so Iβm rather concerned by it, and the >10x speed hit which also happened in my original code).
Hereβs the first code_warntype output (in my original example there was no warnings about threadsfor_fun
, merely a Body::Any
):
julia> @code_warntype unstable();
MethodInstance for unstable()
from unstable() @ Main REPL[16]:1
Arguments
#self#::Core.Const(unstable)
Locals
threadsfor_fun::var"#81#threadsfor_fun#14"{var"#81#threadsfor_fun#13#15"{_A}} where _A
x@_3::Core.Box
threadsfor_fun#13::var"#81#threadsfor_fun#13#15"
range::Any
x@_6::Union{}
x@_7::Union{}
Body::Any
1 ββ Core.NewvarNode(:(threadsfor_fun))
β (x@_3 = Core.Box())
β %3 = Main.zeros(0)::Vector{Float64}
β Core.setfield!(x@_3, :contents, %3)
β %5 = Main.zeros(1000000)::Vector{Float64}
β Core.setfield!(x@_3, :contents, %5)
β %7 = Core.isdefined(x@_3, :contents)::Bool
ββββ goto #3 if not %7
2 ββ goto #4
3 ββ Core.NewvarNode(:(x@_6))
ββββ x@_6
4 ββ %12 = Core.getfield(x@_3, :contents)::Any
β %13 = Main.eachindex(%12)::Any
β (range = %13)
β %15 = Main.:(var"#81#threadsfor_fun#13#15")::Core.Const(var"#81#threadsfor_fun#13#15")
β %16 = Core.typeof(range)::DataType
β %17 = Core.apply_type(%15, %16)::Type{var"#81#threadsfor_fun#13#15"{_A}} where _A
β %18 = x@_3::Core.Box
β (threadsfor_fun#13 = %new(%17, %18, range))
β %20 = Main.:(var"#81#threadsfor_fun#14")::Core.Const(var"#81#threadsfor_fun#14")
β %21 = Core.typeof(threadsfor_fun#13)::Type{var"#81#threadsfor_fun#13#15"{_A}} where _A
β %22 = Core.apply_type(%20, %21)::Type{var"#81#threadsfor_fun#14"{var"#81#threadsfor_fun#13#15"{_A}}} where _A
β (threadsfor_fun = %new(%22, threadsfor_fun#13))
β %24 = threadsfor_fun::var"#81#threadsfor_fun#14"{var"#81#threadsfor_fun#13#15"{_A}} where _A
β Core.ifelse(false, false, %24)
ββββ goto #6 if not true
5 ββ Base.Threads.threading_run(threadsfor_fun, false)
ββββ goto #7
6 ββ Core.Const(:($(Expr(:foreigncall, :(:jl_in_threaded_region), Int32, svec(), 0, :(:ccall)))))
β Core.Const(:(%29 != 0))
β Core.Const(:(goto %34 if not %30))
β Core.Const(:(Base.Threads.error("`@threads :static` cannot be used concurrently or nested")))
β Core.Const(:(goto %35))
ββββ Core.Const(:(Base.Threads.threading_run(threadsfor_fun, true)))
7 ββ Base.Threads.nothing
β %36 = Core.isdefined(x@_3, :contents)::Bool
ββββ goto #9 if not %36
8 ββ goto #10
9 ββ Core.NewvarNode(:(x@_7))
ββββ x@_7
10 β %41 = Core.getfield(x@_3, :contents)::Any
ββββ return %41
I am on an 8-core ryzen2 ubuntu laptop, running julia 1.10.0. I tried other releases, no difference. Thanks for any help, since this caused me a couple of hours of painful debugging!