# Using an assignment outside of threaded for loop degraded performance by a huge amount

I created an MWE for this question, in practice the only difference between the `f`s functions is that `T = Int` is introduced in different scopes, however the first two versions are very slow:

``````using Base.Threads: @threads, nthreads

function makechunks(X::AbstractVector, n::Integer)
c = length(X) ÷ n
return [X[1+c*k:(k == n-1 ? end : c*k+c)] for k = 0:n-1]
end

function f1(vcs) # very slow
T = Int
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end

function f2(vcs) # very slow
T = Int
smap = Dict("a" => T[1:5000;], "b" => T[1:5000;])
T = Int
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end

function f3(vcs) # fast
Q = Int
smap = Dict("a" => Q[1:5000;], "b" => Q[1:5000;])
T = Int
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
``````

timings:

``````f1(fill(["a", "b"], 5000))
@time f1(fill(["a", "b"], 5000)) # 4.124463 seconds (49.99 M allocations: 763.007 MiB, 4.31% gc time)

f2(fill(["a", "b"], 5000))
@time f2(fill(["a", "b"], 5000)) # 3.106983 seconds (49.99 M allocations: 763.007 MiB, 2.36% gc time)

f3(fill(["a", "b"], 5000))
@time f3(fill(["a", "b"], 5000)) # 0.000160 seconds (89 allocations: 164.969 KiB)
``````

Is this something expected? I’m not so well versed with multithreading to be sure but seems bad to me

1 Like

mmmh, maybe it is an instance of the problem described in this open issue: performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub

2 Likes

Yes, with nuance. `@threads` does put the expression into a closure that captures `T = Int` in `f1` and `f2`, but that closure issue has improved since then and un-reassigned captured variables are inferred fine. Here’s a demonstration of that:

``````julia> let
x = Int
onex() = one(x)
end
(::var"#onex#3"{DataType}) (generic function with 1 method)

julia> let
x = Int
onex() = (x = Float64; one(x))
end
(::var"#onex#9") (generic function with 1 method)
``````

In the 1st case like `f1`, there is a type parameter `DataType` for the type of the captured variable `x`, but in the 2nd case like `f2` where the closure reassigns `x`, the parameter is absent because the compiler fell back to not inferring it (variable’s value is stored in a `Core.Box`, which is like a `Ref{Any}`).

So why is the well-inferred `f1` just as slow as `f2`? Because `one(::DataType)` is not type-stable in general e.g. `one(Int)` and `one(Float64)` have different types. `f3` evades this issue because `T` is not captured, so when the closure is compiled, `T` can be recognized as a constant and constant-folding can fix `one(T)` at compile-time. Unfortunately, captured variables’ inferrability are determined at parse-time when their types are still completely unknown, so closures can’t do better than `struct ...{...S...} ... T::S ... end` and finding out later that `S=DataType`. It’d be nice if we could have `T::Type{S2}` so `S2=Int` can allow constant-folding.

1 Like

What if you make it a global const?

``````const T = Int
function f4(vcs) # very slow
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end

f4(fill(["a", "b"], 5000))
@time f4(fill(["a", "b"], 5000)) # 0.000157 seconds (21 allocations: 157.688 KiB)
``````

It is again fast

I think another workaround is to interpolate the variable, like in:

``````c += one(\$T)
``````

(You should use `c = zero(\$T)` as well, to avoid a promotion there)

``````julia> function f1(vcs) # very slow
T = Int
smap = Dict("a" => [1:5000;], "b" => [1:5000;])
c = 0
for v in chunk
for s in v
for idx in smap[s]
c += one(\$T)
end
end
end
end
return
end
ERROR: syntax: "\$" expression outside quote around REPL:9
Stacktrace:
 top-level scope
@ REPL:1
``````

If you mean this, it throws a syntax error

Actually the interpolation works with `@spawn`, not with `@threads`:

``````julia> function f1_2(vcs) # very slow
T = Int
smap = Dict("a" => [1:500;], "b" => [1:500;])
@sync for chunk in makechunks(vcs, nthreads())
@spawn begin
c = zero(\$T)
for v in \$chunk
for s in v
for idx in \$smap[s]
c += one(\$T)
end
end
end
end
end
return
end
f1_2 (generic function with 1 method)

julia> @time f1_2(fill(["a","b"], 500))
0.045161 seconds (538.60 k allocations: 10.318 MiB, 4.56% gc time, 417.59% compilation time)

julia> @time f1_2(fill(["a","b"], 500))
0.021519 seconds (495.98 k allocations: 7.588 MiB, 8.42% gc time)
``````

Which I used as a solution before (see Type-instability because of @threads boxing variables - #21 by Elrod). It should be equivalent to creating a let block, but that doesn’t seem to be doing the job here:

``````julia> function f1_2(vcs) # very slow
T = Int
smap = Dict("a" => [1:500;], "b" => [1:500;])
let T = T, chunk = chunk, smap = smap
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
end
return
end
f1_2 (generic function with 1 method)

julia> @time f1_2(fill(["a","b"], 500))
0.048589 seconds (508.51 k allocations: 8.427 MiB, 486.98% compilation time)

julia> @time f1_2(fill(["a","b"], 500))
0.021447 seconds (495.97 k allocations: 7.588 MiB, 12.07% gc time)
``````

Probably there is something I’m missing.

2 Likes

`@spawn`’s interpolation of `T` doesn’t make the instance `Int` visible and constant to the compiler like a `const` or wholly local variable could; it can’t because it’s expanded at parse-time when `T` is just a symbol in `f1_2`. It doesn’t capture the variable or copy the instance either, it captures the instance like a hidden argument. Most it can automatically infer is `DataType`.

Making `let` blocks helps inferrability of captured variables, but again, inferring `T::DataType` doesn’t make `one(::T)` inferrable, constant propagation is needed for that, and we can’t add method parameters to hidden closures to help with that.

1 Like

Oh, that’s interesting. But that suggests this pattern, of instead of using `T = Int` using `T = 1`, and using `zero(T)` as `zero(1)` and `one(T)` as meaning `one(1)`. That solves the issues with the captured variable:

``````julia> function f1_2(vcs) # very slow
T = 1
smap = Dict("a" => [1:500;], "b" => [1:500;])
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
end
return
end
f1_2 (generic function with 1 method)

julia> @time f1_2(fill(["a","b"], 500))
0.022976 seconds (12.30 k allocations: 864.859 KiB, 99.68% compilation time)

julia> @time f1_2(fill(["a","b"], 500))
0.000104 seconds (16 allocations: 17.453 KiB)
``````

and it works if one obtains the value of `T` from an input variable:

``````julia> function f1_2(vcs, val) # very slow
T = one(val)
smap = Dict("a" => [1:500;], "b" => [1:500;])
#let T = T, chunk = chunk, smap = smap
c = zero(T)
for v in chunk
for s in v
for idx in smap[s]
c += one(T)
end
end
end
#end
end
return
end
f1_2 (generic function with 2 methods)

julia> @time f1_2(fill(["a","b"], 500), 1)
0.023463 seconds (12.23 k allocations: 854.250 KiB, 99.69% compilation time)

julia> @time f1_2(fill(["a","b"], 500), 1)
0.000054 seconds (16 allocations: 17.453 KiB)
``````
1 Like

Could also pass `Int` in as a method parameter `T` at that point too.

1 Like