Avoiding memory allocation with function passed as argument

Consider the code at the bottom. The actual case is more complex, so something that does not depend on the definition of the function A would be appreciated.

This gives the time and allocation numbers below. Presumably the compiler sees through things when functionversion is called inside tryboth.
But there is memory allocation in the second call of functionversion in somethingelse.

  1. Is this caused by type instability? (unknown type signature of the function F)
  2. How should I rewrite the function somethingelse to avoid the memory allocation? Can generated functions be used here?

Thanks!

[ Info: time and memory used by regular version
0.007638 seconds
[ Info: time and memory used by clumsy version
0.004817 seconds
[ Info: memory allocated by clumsy version two
0.017712 seconds (224 allocations: 13.188 KiB, 56.71% compilation time)
0.007679 seconds (3 allocations: 48 bytes)

const T = Float64
const R = 1_000
const C = 10_000

function matrixversion( A::Matrix{T} )
    rv = T(0.0)
    for c āˆˆ axes(A,2)
        for r āˆˆ axes(A,1)
            rv += A[r,c]^2
        end
    end
    rv
end


function functionversion( A::Function, rows::Int, cols::Int )
    rv = T(0.0)
    for c āˆˆ 1:cols
        for r āˆˆ 1:rows
            rv += A(r,c)^2
        end
    end
    rv
end
    

function generateA( X::Matrix{T} )
    A( r::Int, c::Int ) = X[r,c]
    A
end


function tryboth(rows::Int, cols::Int)
    X = randn( rows, cols )
    A = generateA( X )
    @info "time and memory used by regular version"
    matrixversion( X )
    @time matrixversion( X ) 
    @info "time and memory used by clumsy version"
    functionversion( A, rows, cols )
    @time functionversion( A, rows, cols ) 
    return A
end



function somethingelse( F::Function, rows::Int, cols::Int )
    @time functionversion(F, rows, cols )
end

function testme(rows::Int, cols::Int)
    A = tryboth( rows, cols )
    @info "memory allocated by clumsy version two"
    somethingelse( A, rows, cols )
    somethingelse( A, rows, cols )  # how can I make this call use less memory?
end

testme(R,C)

I think the problem is that this A does not receive X as input, so it does not specialize to the type of X, it can return any value. You can see with @code_warntype that rv is type-unstable. Put X as a parameter of A and should fix the problem (or X should be a constant as well).

1 Like

this ā€œcapturesā€ X within A

1 Like

Thanks Leandro. Passing X as an argument is what Iā€™m seeking to avoid.

Probably what you want is to use closures then: https://m3g.github.io/JuliaNotes.jl/stable/closures/

A small rewrite of your code:

julia> function functionversion( A::Function, rows::Int, cols::Int )
           rv = 0.
           for c āˆˆ 1:cols
               for r āˆˆ 1:rows
                   rv += A(r,c)^2
               end
           end
           rv
       end
functionversion (generic function with 1 method)

julia> function tryboth(rows::Int, cols::Int)
           X = randn( rows, cols )
           A(X,i,j) = X[i,j]
           rv = functionversion( (i,j) -> A(X,i,j), rows, cols )
           @time functionversion( (i,j) -> A(X,i,j), rows, cols ) 
           return rv
       end
tryboth (generic function with 1 method)

julia> tryboth(5,5)
  0.000000 seconds
16.97223743278344


1 Like

I see a 3x speed improvement and no additional allocations when removing all types:

[ Info: time and memory used by regular version
  0.033972 seconds
[ Info: time and memory used by clumsy version
  0.038973 seconds
[ Info: memory allocated by clumsy version two
  0.077030 seconds (224 allocations: 13.188 KiB, 65.44% compilation time)
  0.095871 seconds (3 allocations: 48 bytes)
1.000054829638856e7

[ Info: time and memory used by regular version
  0.021514 seconds
[ Info: time and memory used by clumsy version
  0.013041 seconds
[ Info: memory allocated by clumsy version two
  0.035200 seconds (3 allocations: 48 bytes)
  0.021824 seconds (3 allocations: 48 bytes)
9.99549944189594e6

Can you get by without specialization?
Or, if needed, you might be able to apply them differently.

Saw the timings (and zero allocations?) only after my reply.
0 seconds! Is this on a Cray-3, how come?

Youā€™re passing it one way or another, either implicitly (like in the A function you posted) or explicitly (like @lmiq suggested).

Why do you want to avoid passing X as an argument?

The function is just fetching an element from a matrix, and that is certainly much faster 0.03s. What you are measuring there is not the time of the execution of the function, but boxing of variables, etc, caused by the type instability. It is just that @time is not precise enough to measure that execution:

julia> x = rand(10,10);

julia> @time x[1,1]
  0.000004 seconds (1 allocation: 16 bytes)
0.6257467078827637

julia> @btime $x[1,1]
  1.640 ns (0 allocations: 0 bytes)
0.6257467078827637


A more clear benchmark would be:

julia> function tryboth(rows::Int, cols::Int)
                  X = randn( rows, cols )
                  A(X,i,j) = X[i,j]
                  rv = functionversion( (i,j) -> A(X,i,j), rows, cols )
                  return rv
              end
tryboth (generic function with 1 method)

julia> @btime tryboth(5,5)
  127.200 ns (1 allocation: 288 bytes)
20.81950972642394

where the allocation is of course the creation of the X matrix.

2 Likes

Thanks all. This has been helpful. Iā€™ll play around when I get home and follow up.

1 Like

The allocation is caused by compilation for every closure you create (yes, every time you create a closureA, the function which accepts the closure will compile once). This really bugs me before I figure out what is happening here, because when I check LLVM IR I can find no memory allocation instruction (since the allocation is done at code generation stage, not runtime).

So at sometime I create a @noncapture macro to get rid of the allocation (the syntax can be improved, just a proof-of-concept). It should work fine as long as you donā€™t reassign any variable. It looks like this:

function generateA( X::Matrix{T} )
    A = @nocapture X::Matrix{T} (r,c) -> X[r,c]
    A
end

We need to specify the capture variable and itā€™s type. This is the only place we need to modify our code, the result is:

[ Info: time and memory used by regular version
  0.015533 seconds
[ Info: time and memory used by clumsy version
  0.007854 seconds
[ Info: memory allocated by clumsy version two
  0.016303 seconds (3 allocations: 48 bytes)
  0.016718 seconds (3 allocations: 48 bytes)

So how does it work? Itā€™s quite simple. Firstly it rewrites the function A(r,c) = X[r,c] to a non-capture one f(r,c,X) = X[r,c]. Then we create a ā€œNoCaptureClosureā€ type, which is essentially a tuple of capture values and the function. So it looks like roughly:

function generateA(X)
f(r,c,X) = X[r,c]
A = NoCaptureClosure(f,X)

Then we overload the call to A with:

@inline function (cl::NoCaptureClosure)(r,c) = cl.func(cl.X,r,c)

So A(r,c) is expanded to f(X,r,c). Thatā€™s all! Now it has no allocation for compilation at all.

2 Likes

Could you share your @noncapture macro?

In my application, it is a bunch of stuff, the nature of which does not change for each run, but it does change for different runs. Moreover, the place where the function A is created is very different from where it is used.

In other words, I could pass generic extra junk around, but it would be a hassle / ugly.

Thanks @lmiq . This isnā€™t quite my scenario, but it gets me enough in the right direction that Iā€™ll accept it. Thanks again for your help.

1 Like

It shouldnā€™t be hard:

struct NoCaptureClosure{F,Args} <: Function
    f::F
    args::Args
end
function NoCaptureClosure(f::F,args::Args) where{F,Args} 
    NoCaptureClosure{F,Args}(f,args)
end
@inline function (f::NoCaptureClosure)(args...)
    f.f(args...,f.args)
end

macro nocapture(nf...)
    if isempty(nf)
        return :(error())
    else
        nametypes = nf[1:end-1]
        names = [i.args[1] for i in nametypes]
        types = [i.args[2] for i in nametypes]
        f = nf[end]
        if f.head == :(->)
            args = gensym("args")
            if isa(f.args[1],Symbol)
                paralist = :($(esc(f.args[1])),$args)
            else
                paralist = :($(esc.(f.args[1].args)...),$args)
            end
            assigns = [:($(esc(i)) = $(args).$(i)) for i in names]
            funcname = gensym("f_nocapture")
            func = quote
                local function $(funcname)($(paralist.args...))
                    return let $(assigns...)
                        $(esc(f.args[end]))
                    end
                end
            end
            Args = :($(NamedTuple){$(tuple(names...)),$(Tuple){$(esc(types...))}}(tuple($(esc(tuple(names...)...,)))))
            FuncWrap = :($(NoCaptureClosure)($funcname,$Args))
            return quote
                $func
                $FuncWrap
            end
        end
    end
end

Itā€™s still a little inconvenient and buggy, but should work for simple cases. You can try to improve it.

1 Like