I created an MWE for this question, in practice the only difference between the fs functions is that T = Int is introduced in different scopes, however the first two versions are very slow:
using Base.Threads: @threads, nthreads
function makechunks(X::AbstractVector, n::Integer)
c = length(X) ÷ n
return [X[1+c*k:(k == n-1 ? end : c*k+c)] for k = 0:n-1]
end
function f1(vcs) # very slow
T = Int
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
@threads for chunk in makechunks(vcs, nthreads())
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
function f2(vcs) # very slow
T = Int
smap = Dict("a" => T[1:5000;], "b" => T[1:5000;])
@threads for chunk in makechunks(vcs, nthreads())
T = Int
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
function f3(vcs) # fast
Q = Int
smap = Dict("a" => Q[1:5000;], "b" => Q[1:5000;])
@threads for chunk in makechunks(vcs, nthreads())
T = Int
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
Yes, with nuance. @threads does put the expression into a closure that captures T = Int in f1 and f2, but that closure issue has improved since then and un-reassigned captured variables are inferred fine. Here’s a demonstration of that:
julia> let
x = Int
onex() = one(x)
end
(::var"#onex#3"{DataType}) (generic function with 1 method)
julia> let
x = Int
onex() = (x = Float64; one(x))
end
(::var"#onex#9") (generic function with 1 method)
In the 1st case like f1, there is a type parameter DataType for the type of the captured variable x, but in the 2nd case like f2 where the closure reassigns x, the parameter is absent because the compiler fell back to not inferring it (variable’s value is stored in a Core.Box, which is like a Ref{Any}).
So why is the well-inferred f1 just as slow as f2? Because one(::DataType) is not type-stable in general e.g. one(Int) and one(Float64) have different types. f3 evades this issue because T is not captured, so when the closure is compiled, T can be recognized as a constant and constant-folding can fix one(T) at compile-time. Unfortunately, captured variables’ inferrability are determined at parse-time when their types are still completely unknown, so closures can’t do better than struct ...{...S...} ... T::S ... end and finding out later that S=DataType. It’d be nice if we could have T::Type{S2} so S2=Int can allow constant-folding.
const T = Int
function f4(vcs) # very slow
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
@threads for chunk in makechunks(vcs, nthreads())
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
f4(fill(["a", "b"], 5000))
@time f4(fill(["a", "b"], 5000)) # 0.000157 seconds (21 allocations: 157.688 KiB)
julia> function f1(vcs) # very slow
T = Int
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
@threads for chunk in makechunks(vcs, nthreads())
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one($T)
end
end
end
end
return
end
ERROR: syntax: "$" expression outside quote around REPL[4]:9
Stacktrace:
[1] top-level scope
@ REPL[4]:1
Actually the interpolation works with @spawn, not with @threads:
julia> function f1_2(vcs) # very slow
T = Int
smap = Dict("a" => [1:500;], "b" => [1:500;])
@sync for chunk in makechunks(vcs, nthreads())
@spawn begin
c = zero($T)
for v in $chunk
for s in v
for idx in $smap[s]
c += one($T)
end
end
end
end
end
return
end
f1_2 (generic function with 1 method)
julia> @time f1_2(fill(["a","b"], 500))
0.045161 seconds (538.60 k allocations: 10.318 MiB, 4.56% gc time, 417.59% compilation time)
julia> @time f1_2(fill(["a","b"], 500))
0.021519 seconds (495.98 k allocations: 7.588 MiB, 8.42% gc time)
julia> function f1_2(vcs) # very slow
T = Int
smap = Dict("a" => [1:500;], "b" => [1:500;])
@threads for chunk in makechunks(vcs, nthreads())
let T = T, chunk = chunk, smap = smap
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
end
return
end
f1_2 (generic function with 1 method)
julia> @time f1_2(fill(["a","b"], 500))
0.048589 seconds (508.51 k allocations: 8.427 MiB, 486.98% compilation time)
julia> @time f1_2(fill(["a","b"], 500))
0.021447 seconds (495.97 k allocations: 7.588 MiB, 12.07% gc time)
@spawn’s interpolation of T doesn’t make the instance Int visible and constant to the compiler like a const or wholly local variable could; it can’t because it’s expanded at parse-time when T is just a symbol in f1_2. It doesn’t capture the variable or copy the instance either, it captures the instance like a hidden argument. Most it can automatically infer is DataType.
Making let blocks helps inferrability of captured variables, but again, inferring T::DataType doesn’t make one(::T) inferrable, constant propagation is needed for that, and we can’t add method parameters to hidden closures to help with that.
Oh, that’s interesting. But that suggests this pattern, of instead of using T = Int using T = 1, and using zero(T) as zero(1) and one(T) as meaning one(1). That solves the issues with the captured variable:
julia> function f1_2(vcs) # very slow
T = 1
smap = Dict("a" => [1:500;], "b" => [1:500;])
@threads for chunk in makechunks(vcs, nthreads())
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
f1_2 (generic function with 1 method)
julia> @time f1_2(fill(["a","b"], 500))
0.022976 seconds (12.30 k allocations: 864.859 KiB, 99.68% compilation time)
julia> @time f1_2(fill(["a","b"], 500))
0.000104 seconds (16 allocations: 17.453 KiB)
and it works if one obtains the value of T from an input variable:
julia> function f1_2(vcs, val) # very slow
T = one(val)
smap = Dict("a" => [1:500;], "b" => [1:500;])
@threads for chunk in makechunks(vcs, nthreads())
#let T = T, chunk = chunk, smap = smap
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
#end
end
return
end
f1_2 (generic function with 2 methods)
julia> @time f1_2(fill(["a","b"], 500), 1)
0.023463 seconds (12.23 k allocations: 854.250 KiB, 99.69% compilation time)
julia> @time f1_2(fill(["a","b"], 500), 1)
0.000054 seconds (16 allocations: 17.453 KiB)