Annotating a for loop with @threads, that depends on a variable fixed in an if statement, leads to many additional allocations inside the loop. Replacing the if-statement with a ternary-operator removes these additional allocations.
Minimal example:
function g!(f::AbstractVector, flag::Bool, dx::Float64, dy::Float64)
Δ = 0.0 # Not really needed, since if-statement does not introduce new scope
if flag Δ = dy else Δ = dx end
@inbounds Threads.@threads for i in eachindex(f)
f[i] = Δ * f[i]
end
return nothing
end
function g_ternary!(f::AbstractVector, flag::Bool, dx::Float64, dy::Float64)
Δ = flag ? dy : dx
@inbounds Threads.@threads for i in eachindex(f)
f[i] = Δ * f[i]
end
return nothing
end
using BenchmarkTools
N, flag = 2^16, true
y = rand(ComplexF64, N)
dx, dy = rand(2)
@btime g!($y, $flag, $dx, $dy) # 259.165 μs (196140 allocations: 5.00 MiB)
@btime g_ternary!($y, $flag, $dx, $dy) # 7.934 μs (41 allocations: 3.86 KiB)
# Note: The 41 allocations are normal and appear due to @threads
Removing @threads, both variants become non-allocating, so the problem must lie somewhere in the multithreading.
Did I understand something wrong or is this a bug?
julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
JULIA_NUM_THREADS = 8
julia> @code_warntype g!(y, flag, dx, dy)
MethodInstance for g!(::Vector{ComplexF64}, ::Bool, ::Float64, ::Float64)
from g!(f::AbstractVector, flag::Bool, dx::Float64, dy::Float64) in Main at /tmp/foo.jl:51
Arguments
#self#::Core.Const(g!)
f::Vector{ComplexF64}
flag::Bool
dx::Float64
dy::Float64
Locals
threadsfor_fun::var"#150#threadsfor_fun#19"{var"#150#threadsfor_fun#18#20"{Vector{ComplexF64}, Base.OneTo{Int64}}}
Δ::Core.Box
Δ is boxed. Δ = if flag dy else dx end matches more closely your ternary operator and with this change performance is identical for me for g! and g_ternary!
it’s weird that it’s boxed only when @threads is used:
julia> function g(flag::Bool, dx::Float64, dy::Float64)
Δ = 0.0
if flag Δ = dy else Δ = dx end
return Δ
end
julia> @code_warntype g(true, 3.0, 1.0)
MethodInstance for g(::Bool, ::Float64, ::Float64)
from g(flag::Bool, dx::Float64, dy::Float64) in Main at REPL[1]:1
Arguments
#self#::Core.Const(g)
flag::Bool
dx::Float64
dy::Float64
Locals
Δ::Float64
Body::Float64
1 ─ (Δ = 0.0)
└── goto #3 if not flag
2 ─ (Δ = dy)
└── goto #4
3 ─ (Δ = dx)
4 ┄ return Δ
julia> function g(flag::Bool, dx::Float64, dy::Float64)
Δ = 0.0
if flag Δ = dy else Δ = dx end
Threads.@threads for _ = 1:10
f = Δ
end
return Δ
end
g (generic function with 1 method)
julia> @code_warntype g(true, 3.0, 1.0)
MethodInstance for g(::Bool, ::Float64, ::Float64)
from g(flag::Bool, dx::Float64, dy::Float64) in Main at REPL[3]:1
Arguments
#self#::Core.Const(g)
flag::Bool
dx::Float64
dy::Float64
Locals
threadsfor_fun::var"#15#threadsfor_fun#2"{var"#15#threadsfor_fun#1#3"{UnitRange{Int64}}}
Δ@_6::Core.Box
threadsfor_fun#1::var"#15#threadsfor_fun#1#3"{UnitRange{Int64}}
range::UnitRange{Int64}
Δ@_9::Union{}
Body::Any
I don’t quite understand why compiler thinks \Delta may get modified in the loop
So the type instability explains where the allocations come from.
It’s somewhat strange that its only unstable in combination with @threads but since we have a workaround, @giordano 's answer resolves the issue for me
Thanks a lot everyone!