Abstract repeated code in a method without overhead

When writing a method I used some child (internal?) functions to avoid copy/pasting code that was used repeatedly. The methods are not that complicated so I figured the compiler would just inline them and there would be no overhead and the code would be easier to read. What I saw was memory allocations going from 0 to multiple K with the associated drop in performance (Julia 1.5.3 on Linux).

using BenchmarkTools

function test(data, offset)
    avail = ()->length(data) - offset + 1
    block = ()->reinterpret(UInt32, view(data, offset:(offset+8)))

    total = 0
    while offset < length(data)
        total += data[offset]
        offset += 1
    end
    return total
end

@benchmark test(d, 1) setup=(d=rand(Int, 1024))

What I would like to do is use the avail() and block() methods without the performance hit of memory allocations. For example changing the while loop to be while avail() > 0 so no callbacks or anything complicated. Is there another way to abstract that code out? I feel like a macro (if you can even do macros in a function scope) would be overkill, but maybe that’s the way to go?

If you run the above example you will see results like:

BenchmarkTools.Trial: 
  memory estimate:  55.98 KiB
  allocs estimate:  3583
  --------------
  minimum time:     101.712 μs (0.00% GC)
  median time:      104.855 μs (0.00% GC)
  mean time:        108.568 μs (1.06% GC)
  maximum time:     1.061 ms (87.64% GC)
  --------------
  samples:          10000
  evals/sample:     1

Running Julia with --track-allocation show that the allocations are all around the loop:

        -     total = 0
448806912     while offset < length(data)
896737248         total += data[offset]
224841744         offset += 1
        -     end

If you comment out avail and block which are not used, you will see the allocations go to 0 and a sizable increase in performance.

The usual solution to performance problem with a mutable variable capture is to wrap the variable into a Ref:

function test(data, offset)
    let offset = Ref(offset)
        avail() = length(data) - offset[] + 1
        block() = reinterpret(UInt32, view(data, offset[]:(offset[]+8)))

        total = 0
        while avail() > 0
            total += data[offset[]]
            offset[] += 1
        end
        return total
    end
end

julia> @benchmark test(d, 1) setup=(d=rand(Int, 1024))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     533.905 ns (0.00% GC)
  median time:      547.989 ns (0.00% GC)
  mean time:        556.012 ns (0.00% GC)
  maximum time:     1.175 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     190
1 Like

Oh my, that’s interesting. Thanks I’ll use that.