Mysterious performance degradation involving similar(), view() and recursion

I was writing a function that involves the usage of similar() view() and recursion and below is the best MWE I can come up with. My original function is more complex.

I am running this on Julia 0.6.1 on Windows 10 64bit.

The only difference between abc() and abc2() below is the lines

by_sim = similar(by)
val1 = similar(val)

they are either placed before the if statement if hi <= 50; return Dict(by[1] => val[1]); or after with abc() being the former. The number of memory allocations is double in the abc2() vs abc() which I thought was odd. Actually this example doesn’t give much difference in timing, but allocation is a lot higher.

In my more complicated function, a similar change actually caused a 10x drop in timings. So I wanted to report this issue.

by = rand(Int8(1):Int8(100), 10_000_000)
val = Int32.(similar(by))

function abc(by, val; j = 1)
    hi = length(by)
    by_sim = similar(by)
    val1 = similar(val)
    if hi <= 50;  return Dict(by[1] => val[1]);  end


    by_sim[rand(1:hi,hi)] = by[rand(1:hi,hi)]
    val1[rand(1:hi,hi)] = val[rand(1:hi,hi)]

    if j == 2
        return Dict(by_sim[1] => val1[1])
    else
        ii = unique(Int.(round.(collect(linspace(0,hi)))))
        return reduce(merge, [abc(view(by_sim,i1+1:i2), view(val1,i1+1:i2); j = j + 1) for (i1,i2) in zip(ii[1:end-1],ii[2:end])])
    end
end

using Compat, BenchmarkTools
@btime abc(by, val)

function abc2(by,  val; j = 1)
    hi = length(by)
    if hi <= 50;  return Dict(by[1] => val[1]);  end
    by_sim = similar(by)
    val1 = similar(val)

    by_sim[rand(1:hi,hi)] = by[rand(1:hi,hi)]
    val1[rand(1:hi,hi)] = val[rand(1:hi,hi)]

    if j == 2
        return Dict(by_sim[1] => val1[1])
    else
        ii = unique(Int.(round.(collect(linspace(0,hi)))))
        return reduce(merge, [abc2(view(by_sim,i1+1:i2), view(val1,i1+1:i2); j = j + 1) for (i1,i2) in zip(ii[1:end-1],ii[2:end])])
    end
end

@btime abc2(by, val)

similar allocates output arrays, but your if hi <= 50 line allows for a quick return that doesn’t make use of by_sim or val1. By placing the similar lines after the if, you can skip the allocation when that condition holds.

The strange thing is by placing it after it’s slower.

Not on my machine.

julia> @btime abc(by, val)
  2.751 s (1542 allocations: 801.21 MiB)

julia> @btime abc2(by, val)
  2.603 s (3345 allocations: 801.27 MiB)

I didn’t look into the allocations, though.

one is 1542 allocation the other being 3000+ allocations. It’s in the
outpur you showed and I got the same.

This is the strange thing. In my more complicated example doing similar after the if statement actually increases running time! That’s why it’s strange and unintuitive.