I’m going through the Parallel Computing course at juliaacademy.com. They give this example of a prefix code being rewritten in parallel. However, the video does not go through it, and when I run it on my own, I get a lot of memory allocations in the benchmark for prefix_threads!
that I don’t know what it’s from.
Can anyone tell me why it’s doing so many allocations?
function prefix_serial!(y, ⊕)
for i=2:length(y)
y[i] = y[i-1] ⊕ y[i]
end
y
end
function prefix!(y, ⊕)
l=length(y)
k=ceil(Int, log2(l))
@inbounds for j=1:k, i=2^j:2^j:min(l, 2^k) #"reduce"
y[i] = y[i-2^(j-1)] ⊕ y[i]
end
@inbounds for j=(k-1):-1:1, i=3*2^(j-1):2^j:min(l, 2^k) #"expand"
y[i] = y[i-2^(j-1)] ⊕ y[i]
end
y
end
using .Threads
function prefix_threads!(y, ⊕)
l=length(y)
k=ceil(Int, log2(l))
for j=1:k
@threads for i=2^j:2^j:min(l, 2^k) #"reduce"
@inbounds y[i] = y[i-2^(j-1)] ⊕ y[i]
end
end
for j=(k-1):-1:1
@threads for i=3*2^(j-1):2^j:min(l, 2^k) #"expand"
@inbounds y[i] = y[i-2^(j-1)] ⊕ y[i]
end
end
return y
end
A = rand(500_000);
using BenchmarkTools
@btime prefix_serial!($(copy(A)), +);
@btime prefix!($(copy(A)), +);
@btime prefix_threads!($(copy(A)), +);
1.129 ms (0 allocations: 0 bytes)
2.130 ms (0 allocations: 0 bytes)
1.297 ms (1160 allocations: 126.78 KiB)