I ran into some performance discrepancies, which I do not completely understand. Maybe someone of you has an idea?

I want to compute the product of a vector of elements of length `2^k`

(for simplicity). But instead

of computing the product directly I want to compute intermediate results and store them in

an array (representing a binary tree: The leafs are `x1`

, `x2`

, …, the parents of those are `x1*x2`

, `x3*x4`

, and those parents are `x1*x2*x3*x4`

, etc).

Here is a implementation I ended up with (details are probably not that important)

```
function comp_level(n::Integer)
l = 0
n = n >> 1
while n > 0
l += 1
n = n >> 1
end
l
end
function evaltree!(tree, x::Vector{T}) where T
n = length(x)
levels = comp_level(n)
l = levels-1
k = 1 << l # = 2^l
i = 2*k - 1
j = n
while i ≥ k
tree[i] = x[j] * x[j - 1]
i -= 1
j -= 2
end
kk = k
for l = levels-2:-1:0
k = 1 << l # = 2^l
i = 2*k - 1
j = kk
while i ≥ k
tree[i] = tree[j] * tree[j + 1]
i -= 1
j += 2
end
kk = k
end
tree
end
```

Now I did some benchmarking against the naive multiplication:

```
N = 256
tree = Vector{Float64}(N - 1)
u = rand(N) * 3
@btime evaltree!($tree, $u) # 339.703 ns
@btime prod($u) # 35.891 ns
```

That’s somewhat in the range that I expected.

But the weird part is when I do the benchmark for `Complex128`

```
N = 256
tree3 = Vector{Complex128}(N - 1)
u3 = rand(Complex128, N) * 3
@btime evaltree!($tree3, $u3) # 498.309 ns
@btime prod($u3) # 719.548 ns
```

The naive implementation becomes slower!

Does anybody has a clue what is happening here?