 # Allocations on threaded prefix algorithm

I’m going through the Parallel Computing course at juliaacademy.com. They give this example of a prefix code being rewritten in parallel. However, the video does not go through it, and when I run it on my own, I get a lot of memory allocations in the benchmark for `prefix_threads!` that I don’t know what it’s from.

Can anyone tell me why it’s doing so many allocations?

``````function prefix_serial!(y, ⊕)
for i=2:length(y)
y[i] = y[i-1] ⊕ y[i]
end
y
end

function prefix!(y, ⊕)
l=length(y)
k=ceil(Int, log2(l))
@inbounds for j=1:k, i=2^j:2^j:min(l, 2^k)              #"reduce"
y[i] = y[i-2^(j-1)] ⊕ y[i]
end
@inbounds for j=(k-1):-1:1, i=3*2^(j-1):2^j:min(l, 2^k) #"expand"
y[i] = y[i-2^(j-1)] ⊕ y[i]
end
y
end

l=length(y)
k=ceil(Int, log2(l))
for j=1:k
@inbounds y[i] = y[i-2^(j-1)] ⊕ y[i]
end
end
for j=(k-1):-1:1
@inbounds y[i] = y[i-2^(j-1)] ⊕ y[i]
end
end
return y
end

A = rand(500_000);

using BenchmarkTools
@btime prefix_serial!(\$(copy(A)), +);
@btime prefix!(\$(copy(A)), +);