Declaring variables inside a for loop using Threads.@threads has a significant impact on performance

Sangjin_Park · June 28, 2024, 7:18am

Hello. This is my first time asking a question, so I apologize if it’s not in any kind of form. I’m also new to programming and Julia.

As the title says, I often code by indexing array data and declaring it as a new variable. This is normally fine, but when using the Base.Threads library, it causes a serious performance disruption (hundreds of times slower).

This code is a simple test; in practice, it’s much more complex.
Using the “#good” annotated coding style doesn’t degrade performance, but it makes the code very long and less readable. Why is this happening and what is the best way to code so that the code is both readable and performance?

struct infomations
    nn :: Int64
    A1 :: Matrix{Float64}
    A2 :: Matrix{Float64}
    dx :: Float64
    dy :: Float64
    function infomations(nn,L)
        nn = nn+4
        A1 = zeros(nn,nn)
        A2 = rand(nn,nn)
        dx = L/nn
        dy = L/nn
        new(nn,A1,A2,dx,dy)
    end
end

function get_Div(obj1)
    (;nn, A1, A2, dx, dy) = obj1
    ny,nx  = size(A1)
    Threads.@threads for i = 3:nx-2
        for j = 3:ny-2
            # bad
            ee = A2[j,i+2]; e = A2[j,i+1]; w = A2[j,i-1]; ww = A2[j,i-2]
            nn = A2[j+2,i]; n = A2[j+1,i]; s = A2[j-1,i]; ss= A2[j-2,i]
            ∂B1∂x = (-ee + 8*e - 8*w + ww)/dx/12
            ∂B2∂y = (-nn + 8*n - 8*s + ss)/dy/12
            A1[j,i] = ∂B1∂x + ∂B2∂y

            # good
            ∂B1∂x = (-A2[j,i+2] + 8*A2[j,i+1] - 8*A2[j,i-1] + A2[j,i-2])/dx/12
            ∂B2∂y = (-A2[j+2,i] + 8*A2[j+1,i] - 8*A2[j-1,i] + A2[j-2,i])/dy/12
            A1[j,i] = ∂B1∂x + ∂B2∂y
        end
    end
end

function run!(obj1, iter)
    for i = 1:iter
        get_Div(obj1)
    end
end

obj1 = infomations(2000,3)
@time run!(obj1, 200)

Salmon · June 28, 2024, 7:39am

our “bad” and “good” examples should both be equally performant as far as i can tell.

unfortunately I cannot run any code right now but a few ideas:

Use BenchmarkTools.jl with @btime (Are you sure you have not measured compiling time?). Your example should run pretty fast, so it’s possible that the overhead of many threads is too large to give any speedup.

Maybe your example is too far removed from your actual use case but it looks like you are doing convolutions with a small kernel. I can highly recommend the package Stencils.jl for that kind of thing. It has multithreading implemented and is quite performant. (There are also FFT based options, see DSP.jl but I think this only become fast once your kernels also get larger)

Salmon · June 28, 2024, 8:08am

This is the culprit. you overwrite nn which is an Int and later a float. changing the second nn to another name recovers the performance lost

sgaure · June 28, 2024, 8:09am

@code_warntype reveals that the variable nn is boxed, i.e. its type is not inferred correctly. So the type of ∂B2∂y is not inferred correctly. This leads to allocations, which is devastating for performance in parallel runs. Moreover, the program becomes incorrect because the variable nn is shared among all the threads, and you get race conditions and things.

Rename your temporary nn in the “bad code” example.

julia> @code_warntype get_Div(obj1)
...
  nn::Core.Box
...

Sangjin_Park · June 28, 2024, 10:16pm

Oh… I must have missed this, thank you so much.
It’s scary that duplicate names don’t throw an error, only performance.
I’ll check out your other advice as well, thanks again.

Sangjin_Park · June 28, 2024, 10:18pm

Thanks for introducing me to this useful macro, I know a bit more about Julia thanks to you. Have a nice day.

Topic		Replies	Views
Question for lower performance by using @threads in for loop New to Julia question	13	1054	July 9, 2021
Julia Threads.@threads slower than single thread performance Performance multithreading , pde	11	2913	April 24, 2023
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	3975	February 11, 2019
How to lock variables in @threads General Usage question , multithreading	11	2060	March 24, 2021
Inconsistent results when using Threads.@threads in a loop General Usage multithreading	17	867	June 10, 2023

Declaring variables inside a for loop using Threads.@threads has a significant impact on performance

Related topics