Threads memory allocations

Hi. I’m really trying to understand parallel programming in Julia. Can someone please explain to me why using threads requires so many memory allocations?

using BenchmarkTools
using .Threads

function test_serial(u)
    for i in eachindex(u)
        u[i] = threadid()
    end
end

function test_threads(u)
    @threads for i in eachindex(u)
        u[i] = threadid()
    end
end

u = zeros(Int64, 1000)

julia> @benchmark test_serial($u)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     336.647 ns (0.00% GC)
  median time:      347.059 ns (0.00% GC)
  mean time:        373.924 ns (0.00% GC)
  maximum time:     922.167 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     221

julia> @benchmark test_threads($u)
BenchmarkTools.Trial:
  memory estimate:  2.72 KiB
  allocs estimate:  29
  --------------
  minimum time:     8.099 μs (0.00% GC)
  median time:      9.700 μs (0.00% GC)
  mean time:        10.360 μs (0.00% GC)
  maximum time:     136.400 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
1 Like

I am not sure if that’s “so many allocations”, but as far as I know, this is just the overhead of using the parallel threading library.

1 Like

The for loop allocates tasks and fetches the result, thus it essentially it reduces to:

julia> @benchmark threadid()
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.150 ns (0.00% GC)
  median time:      4.282 ns (0.00% GC)
  mean time:        4.453 ns (0.00% GC)
  maximum time:     37.888 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

vs

@benchmark ((t = @async threadid()); fetch(t))
BenchmarkTools.Trial: 
  memory estimate:  704 bytes
  allocs estimate:  7
  --------------
  minimum time:     14.258 μs (0.00% GC)
  median time:      16.535 μs (0.00% GC)
  mean time:        18.000 μs (0.00% GC)
  maximum time:     97.452 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

times your threads for the memory estimate

2 Likes