Strange allocations with default positional argument and static arrays

I am wondering why the second and third version allocates. What am I missing here?

                _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.1.0 (2019-01-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using StaticArrays

julia> function testthis()
           function fg(x, g)
               f = x[1]
               return f, g
           end
           @allocated fg(@SVector([1.0,1.0,1.0]),@SVector([2.0,2.02,2.0]))
       end
testthis (generic function with 1 method)

julia> testthis()
0

julia> testthis()
0

julia> using StaticArrays

julia> function testit()
           function fg(x, g= nothing)
               f = x[1]
               return f, g
           end
           @allocated fg(@SVector([1.0,1.0,1.0]),@SVector([2.0,2.02,2.0]))
       end
testit (generic function with 1 method)

julia> testit()
182676

julia> testit()
48

julia> function testnow()
           function fg(x)
               f = x[1]
               return f, nothing
           end
           function fg(x, g)
               f = x[1]
               return f, g
           end
           @allocated fg(@SVector([1.0,1.0,1.0]),@SVector([2.0,2.02,2.0]))
       end
testnow (generic function with 1 method)

julia> testnow()
182628

julia> testnow()
48

Maybe try creating the SVectors first?

This is our most beloved issue 15276 in action again. You can look at it by comparing.

julia> Meta.@lower function testit2()
           @inline function fg(x, g)
               f = x[1]
               return f, g
           end
         #  @inline fg(x) = fg(x, nothing)
           @allocated fg(@SVector([1.0,1.0,1.0]),@SVector([2.0,2.02,2.0]))
       end
julia> Meta.@lower function testit3()
           @inline function fg(x, g)
               f = x[1]
               return f, g
           end
           @inline fg(x) = fg(x, nothing)
           @allocated fg(@SVector([1.0,1.0,1.0]),@SVector([2.0,2.02,2.0]))
       end

The upshot is that inner functions / closures are very fragile with respect to optimization. This is because all the boxing decisions are made in femtolisp, close to the parser, instead of the much more mature SSAIR optimizer.

This means that boxing decisions depend on syntactic sugar like syntax trees, scope (loops, let-blocks, etc) or how you write your code, instead of data flow analysis / the control flow graph. This makes it very hard to reason about these early compiler decisions (unless you intimately know the femtolisp code for lowering; I don’t).

I cannot tell you what part of your code triggered this, sorry. As a workaround, I recommend to simply refactor your code to not use inner functions (define callable struct / function at top level scope instead), whenever you hit such issues.

Lowering-related boxing issues are relatively easy to detect, because they are mostly context insensitive and later optimization passes can never help alleviate them (as opposed to e.g. llvm sometimes deciding to load a NTuple{16, VecElement{UInt8}} byte-by-byte and sometimes using an appropriate 128 bit wide load, depending on the context where your function has been inlined to).

PS. I think the issue with default arguments / inner functions is recursion (function fg calls fg, albeit a different method), and the lowering code that decides on boxing is incapable to statically resolve dispatch and inline until the recursion has vanished.

3 Likes

Okay, I did not expect this to be a boxing issue, but that issue always creeps up when you don’t expect it. The problem is that I don’t think it’s a question of the function being defined inside my other functions because that’s not the case in the actual use case. I will try to look at the output of your code and investigate further when I’m at a pc again. Thanks so far :slight_smile:

1 Like

I can maybe try to be a bit closer to the original setup, but then it’s no longer possible to come up with something that’s minimal in any sense, it’ll be speculation or that someone is able to say “ah then it must be!”

I have this that I want to optimize.


    function fletcher_powell_fg_static(∇f, x)
        T = eltype(x)
        theta_x = theta(x)

        if !(∇f==nothing)
            if x[1]^2 + x[2]^2 == 0
                dtdx1 = T(0)
                dtdx2 = T(0)
            else
                dtdx1 = - x[2] / ( T(2) * pi * ( x[1]^2 + x[2]^2 ) )
                dtdx2 =   x[1] / ( T(2) * pi * ( x[1]^2 + x[2]^2 ) )
            end
            ∇f1 = -2000.0*(x[3]-10.0*theta_x)*dtdx1 +
                200.0*(sqrt(x[1]^2+x[2]^2)-1)*x[1]/sqrt( x[1]^2+x[2]^2 )
            ∇f2 = -2000.0*(x[3]-10.0*theta_x)*dtdx2 +
                200.0*(sqrt(x[1]^2+x[2]^2)-1)*x[2]/sqrt( x[1]^2+x[2]^2 )
            ∇f3 =  200.0*(x[3]-10.0*theta_x) + 2.0*x[3];
            ∇f = @SVector[∇f1, ∇f2, ∇f3]
        end

        fx = 100.0 * ((x[3] - 10.0 * theta_x)^2 + (sqrt(x[1]^2 + x[2]^2) - 1.0)^2) + x[3]^2

        if ∇f == nothing
            return fx
        else
            return fx, ∇f
        end
    end

using a package I’m developing. If I pass that into my minimize function along with a static vector, everything is fine. However, if I add a method to fletcher_powell_fg_static(x) then suddenly minimize is no longer allocation-free! And that method signature is not used at all, it’s just that it exists. So I’m curious if this is some dynamic dispatch that doesn’t happen when there’s only one method to the function, but I know far too little about these internals to debug the issue.

So it’s something like this

using NLSolvers, StaticArrays
function fletcher_powell_fg_static(g, x)
    ...
end
sv3 = @SVector[0.0,0.0,0.0]
@allocated minimize(fletcher_powell_fg_static, @SVector[-0.5, 0.0, 0.0])
@allocated minimize(fletcher_powell_fg_static, @SVector[-0.5, 0.0, 0.0]) # this call doesn't allocate

function fletcher_powell_fg_static(x)
    ...
end
@allocated minimize(fletcher_powell_fg_static, @SVector[-0.5, 0.0, 0.0]) # 
@allocated minimize(fletcher_powell_fg_static, @SVector[-0.5, 0.0, 0.0]) # this call does(!) allocate

Have you considered using dispatch instead of checks on nabla f? i.e. fletcher_powell_fg_static(::Nothing, x) and fletcher_powell_fg_static(∇f, x)? Or is that seemingly benign refactor exactly the situation that hit you?

Can you quickly check whether this is a type inference / boxing / dynamic dispatch issue? You can do that with either @code_typed or, even better, the excellent cthulhu. You want to look out for any abstract or vararg or Core.Box types in methods that you believe to be fully inferable (in numeric code like yours: Everywhere). Small unions crop up everywhere, due to the new iteration protocol, but are mostly benign and are optimized out at later passes.

Can you try with julia --check-bounds=no and julia --check-bounds=yes? Bounds checks can cause allocations sometimes (optimizer fails to elide allocations because objects could escape in error path).

Can you try with explicit @inline and @noinline? The inline heuristics are a little finicky, and failure to inline can cause allocations (optimizer fails to elide allocations because it does not see far enough to prove that objects don’t escape).

2 Likes

I will try to look at the things you mentioned, thank you. As I said, I’m having trouble figuring out what to look for, so it’s much appreciated.

I am currently using dispatch. The thing is, that some methods might only need the value, some need the first derivative and some need the second. This means that since I’m getting allocations whenever I have more than one method, the user has to do the following

function zero_order_f(x)
    ...
end
function first_order_f(x, df) # or (df, x)
    ...
end
function second_order_f(x, df, ddf)
    ...
end

to avoid allocations! What I had hoped to be able to do was to just ask users to define methods with the appropriate number of arguments, and then they could simpy write

# if they know they only need zero order
function my_f(x)
   ...
end

# If they want to be able to use up to first order methods
function m_ff(x, df=nothing)
   ...
end

# if they want to be able to use up to second order
function m_fff(x, df=nothing, ddf=nothing)
   ...
end

This basically means that if I want users to be able to a) have zero allocations for static arrays and b) be able to input the same objective across methods, they always have to define the function as

function my_function(x, df, ddf)
    ...
end

even if they just want to use simulated annealing or nelder mead.

So I would really appreciate if I could instead ask them to provide a function that should at least have a method for (x) it would also need a method for (x, df) if they want to use first order methods and it should also have a method for (x, df, ddf) if they want to use newton’s method. But right now, it appears that my code will allocate if I ask them to do this. So I am “stuck” with the “always define three positional arguments even if you plan to ignore the two last ones”, and in my library code I have to write objective(x, nothing, nothing) if I just want to evaluate the value, instead of having objective(x) pick the appropriate method.

Note, in the existing Optim.jl I wouldn’t care, because that doesn’t support StaticArrays at all, but I’m trying to write it in a way such that people can get non-allocating calls if they want.