Problem on benchmarking multi-thread code

Hello,
I am trying to benchmark function with thread parallelization and compare the result with non-parallelized version of function.

using Base.Threads

nx = 10000
a = zeros(nx)
b = rand(nx)
dindicies = 2:(nx-1)

function update!(A,B)
    for i=dindicies
        @inbounds A[i] = B[i] - B[i-1]
    end
end

function update_thread!(A,B)
    @threads for i=dindicies
        @inbounds A[i] = B[i] - B[i-1]
    end
end

I get the result with @btime

julia> @btime update!($a,$b)
  1.329 ms (68455 allocations: 1.20 MiB)
julia> @btime update_thread!($a,$b)
  5.796 μs (21 allocations: 2.81 KiB)

Why the non-parallelized version allocate far more memory while the parallelized version doesn’t?

Your functions use global variables. That might not be optimal.

2 Likes

I still not quite understand. Could you give me more hint? What make me confused is the big difference of allocations.

Try to declare all variables used within the function scope

function update!(A,B)
    nx = 10000
    dindicies = 2:(nx-1)
    for i=dindicies
        @inbounds A[i] = B[i] - B[i-1]
    end
end

See Performance Tips · The Julia Language for more details.