Mysterious performance degradation involving similar(), view() and recursion

bug
performance

#1

I was writing a function that involves the usage of similar() view() and recursion and below is the best MWE I can come up with. My original function is more complex.

I am running this on Julia 0.6.1 on Windows 10 64bit.

The only difference between abc() and abc2() below is the lines

by_sim = similar(by)
val1 = similar(val)

they are either placed before the if statement if hi <= 50; return Dict(by[1] => val[1]); or after with abc() being the former. The number of memory allocations is double in the abc2() vs abc() which I thought was odd. Actually this example doesn’t give much difference in timing, but allocation is a lot higher.

In my more complicated function, a similar change actually caused a 10x drop in timings. So I wanted to report this issue.

by = rand(Int8(1):Int8(100), 10_000_000)
val = Int32.(similar(by))

function abc(by, val; j = 1)
    hi = length(by)
    by_sim = similar(by)
    val1 = similar(val)
    if hi <= 50;  return Dict(by[1] => val[1]);  end


    by_sim[rand(1:hi,hi)] = by[rand(1:hi,hi)]
    val1[rand(1:hi,hi)] = val[rand(1:hi,hi)]

    if j == 2
        return Dict(by_sim[1] => val1[1])
    else
        ii = unique(Int.(round.(collect(linspace(0,hi)))))
        return reduce(merge, [abc(view(by_sim,i1+1:i2), view(val1,i1+1:i2); j = j + 1) for (i1,i2) in zip(ii[1:end-1],ii[2:end])])
    end
end

using Compat, BenchmarkTools
@btime abc(by, val)

function abc2(by,  val; j = 1)
    hi = length(by)
    if hi <= 50;  return Dict(by[1] => val[1]);  end
    by_sim = similar(by)
    val1 = similar(val)

    by_sim[rand(1:hi,hi)] = by[rand(1:hi,hi)]
    val1[rand(1:hi,hi)] = val[rand(1:hi,hi)]

    if j == 2
        return Dict(by_sim[1] => val1[1])
    else
        ii = unique(Int.(round.(collect(linspace(0,hi)))))
        return reduce(merge, [abc2(view(by_sim,i1+1:i2), view(val1,i1+1:i2); j = j + 1) for (i1,i2) in zip(ii[1:end-1],ii[2:end])])
    end
end

@btime abc2(by, val)

#2

similar allocates output arrays, but your if hi <= 50 line allows for a quick return that doesn’t make use of by_sim or val1. By placing the similar lines after the if, you can skip the allocation when that condition holds.


#3

The strange thing is by placing it after it’s slower.


#4

Not on my machine.

julia> @btime abc(by, val)
  2.751 s (1542 allocations: 801.21 MiB)

julia> @btime abc2(by, val)
  2.603 s (3345 allocations: 801.27 MiB)

I didn’t look into the allocations, though.


#5

one is 1542 allocation the other being 3000+ allocations. It’s in the
outpur you showed and I got the same.

This is the strange thing. In my more complicated example doing similar after the if statement actually increases running time! That’s why it’s strange and unintuitive.