Essentially, I call a function within nested loops. In serial, no heap allocations occur within the function. When run on multiple threads via @threads
, allocations are reported within the function. Because the threaded case runs much slower, I suspect this is not only a reporting issue.
I know there are some related issues to this elsewhere, but I haven’t really seen anything that would immediately solve this problem. In particular, it would be good to know whether this is something subtle in the code @threads
is operating on or if this is a more fundamental problem with @threads
and I need to switch to some other threading construct. I have tried @batch
, which has the same issue.
The following example may fail to be ‘minimal’, but it is complete and demonstrates the issue.
1 #!/usr/bin/env julia
2
3
4 using Profile
5
6 using .Threads
7
8 using StaticArrays
9
10 using BenchmarkTools
11
12
13 function nonsense_work(v::SVector{3,F}, i::Int) where {F}
14 return v.^i ./ SVector{3,F}(1, 2, 3)
15 end
16
17
18 function body()
19 m, n = 100, 200
20 some_vectors = [SVector(i, i-1, i+1) for i in 1:n]
21 result = Array{SVector{3,Float64},2}(undef, m, n)
22 @threads for i in 1:m
23 for (j, v) in enumerate(some_vectors)
24 result[i,j] = nonsense_work(v, i)
25 end
26 end
27 end
28
29
30 function main()
31 body()
32 Profile.clear_malloc_data()
33 body()
34 @time body()
35 @btime body()
36 end
37
38
39 main()
40
41
$ JULIA_NUM_THREADS=1
$ julia --track-allocation=user threadallocmwe.jl
0.009075 seconds (9 allocations: 474.188 KiB)
8.823 ms (9 allocations: 474.19 KiB)
$ JULIA_NUM_THREADS=4
$ julia --track-allocation=user threadallocmwe.jl
0.040592 seconds (25 allocations: 475.547 KiB)
29.747 ms (24 allocations: 475.52 KiB)
In the first case:
13 - function nonsense_work(v::SVector{3,F}, i::Int) where {F}
14 0 return v.^i ./ SVector{3,F}(1, 2, 3)
15 - end
16 -
17 -
18 - function body()
19 - m, n = 100, 200
20 0 some_vectors = [SVector(i, i-1, i+1) for i in 1:n]
21 534809120 result = Array{SVector{3,Float64},2}(undef, m, n)
22 53472 @threads for i in 1:m
23 - for (j, v) in enumerate(some_vectors)
24 - result[i,j] = nonsense_work(v, i)
25 - end
26 - end
27 - end
and in the second:
13 - function nonsense_work(v::SVector{3,F}, i::Int) where {F}
14 15584 return v.^i ./ SVector{3,F}(1, 2, 3)
15 - end
16 -
17 -
18 - function body()
19 - m, n = 100, 200
20 0 some_vectors = [SVector(i, i-1, i+1) for i in 1:n]
21 127701280 result = Array{SVector{3,Float64},2}(undef, m, n)
22 12768 @threads for i in 1:m
23 - for (j, v) in enumerate(some_vectors)
24 - result[i,j] = nonsense_work(v, i)
25 - end
26 - end
27 - end