Why do these two similar functions lead to different timings?

I am in the process of porting over some Matlab code that implements the Vector Field Navigation approached outline in Goncalves et al. 2010 leveraging AD. The code is working properly, but I am confused why two seemingly equivalent implementations results in significantly different timings and # of allocations.

VF_grad(), takes function gradients as its arguments, while VF_func() takes the original function and then determines the gradients.

using ForwardDiff
using StaticArrays
using LinearAlgebra
using BenchmarkTools

α1(x::AbstractArray) = (x[1]/13.0).^2.0 .+ (x[2]/7.0).^2.0 .- 1.0
α2(x::AbstractArray) = x[3]
V(x::AbstractArray)  = -sqrt(α1(x)^2.0 + α2(x)^2.0);

∇α1(x::AbstractArray) = ForwardDiff.gradient(α1,x)
∇α2(x::AbstractArray) = ForwardDiff.gradient(α2,x)
∇V(x::AbstractArray)  = ForwardDiff.gradient(V,x);

function VF_grad(x::AbstractArray, ∇α1::Function, ∇α2::Function, ∇V::Function, G::Real=1.0, H::Real=1.0)
    ∇α1x = ∇α1(x)
    ∇α2x = ∇α2(x)
    circ = ∇α1x × ∇α2x
    converge = ∇V(x)

    G*converge + H*circ

function VF_func(x::AbstractArray, α1::Function, α2::Function, V::Function, G::Real=1.0, H::Real=1.0)
    ∇α1x = ForwardDiff.gradient(α1,x)
    ∇α2x = ForwardDiff.gradient(α2,x)
    circ = ∇α1x × ∇α2x
    converge = ForwardDiff.gradient(V,x)

    G*converge + H*circ

## Make Data
eval_array = [SVector(rand(3)...) for i ∈ 1:750];
@btime VF_func.($eval_array, α1, α2, V)
@btime VF_grad.($eval_array, ∇α1, ∇α2, ∇V);

259.922 μs (9004 allocations: 275.58 KiB)
 53.775 μs (4 allocations: 17.77 KiB)

Can somebody explain this behavior to me? Both look to be type-stable.

For some reason this behavior seems connected to ×. Changing circ = ∇α1x × ∇α2x to circ = ∇α1x leads to

51.266 μs (4 allocations: 17.77 KiB)
51.253 μs (4 allocations: 17.77 KiB)

Also, if I use the original definition of circ but modify my return to G*converge + circ, i.e. no H* I get

51.489 μs (4 allocations: 17.77 KiB)
51.497 μs (4 allocations: 17.77 KiB)

I think julia is not fully specializing VR_func. You could try to force to specialize on the function arguments like this:

function VF_func2(x::AbstractArray, α1::F1, α2::F2, V::F3, G::Real=1.0, H::Real=1.0) where {F1,F2,F3}
           ∇α1x = ForwardDiff.gradient(α1,x)
           ∇α2x = ForwardDiff.gradient(α2,x)
           circ = ∇α1x × ∇α2x
           converge = ForwardDiff.gradient(V,x)

           G*converge + H*circ

Note that usually julia does a good job when deciding what to specialize and you usually don’t need to force it.

I think the heuristic is related to whether the function argument is directly called or not in the function body but I don’t know the details.

Another approach: return a callable function (since you presumably won’t be changing α1, α2, and V for each x), then call it:
***edit: sped up ~50% by @inline annotations

function VF(α1, α2, V, G=1.0, H=1.0)
    @inline ∇α1(x) = ForwardDiff.gradient(α1,x)
    @inline ∇α2(x) = ForwardDiff.gradient(α2,x)
    @inline converge(x) = ForwardDiff.gradient(V,x)

    x -> G*converge(x) + H*(∇α1(x) × ∇α2(x))
@btime VF(α1, α2, V).($eval_array)    #  12.693 μs (7 allocations: 17.84 KiB)
1 Like

@jw3126 This works, but I am having a hard time understanding why. Are α1, α2, and V different subtypes of Function? Running typeof(α1) seems to indicate that, I get typeof(α1) returned.

Because my function requires α1, α2, and V to be callable, would I be best off using something like

where {F1<:Function,F2<:Function,F3<:Function}

Or is there a more generic way to limit to types that are “callable”. E.g. These functions should be able to handle custom types like ODESolution form DifferentialEquations.jl since they are callable, but I expect that they are not subtypes of Functions.

@kristoffer.carlsson I think you are probably correct. When I add any combination of

a = α1(x)
b = α2(x)
c = V(x)

to my function, I get reduced timings and # of allocations even though I am increasing the number of operations.


That limits it to functions but prevents general callable objects. I would argue it is best to just remove the <:Function and do duck typing instead.

I’m not sure I know what you mean. Can you elaborate? Does this mean adding logic in my functions based based on α1 being callable? Something like


This works for me too and makes sense. However, I see no speed-up using @inline vs. not. I am using Julia v0.7. I thought it was maybe due to different optimization flags but I tried using -O3, but saw no change.

Just assume that the thing passed in is callable and if it isn’t, there will be a runtime error.

1 Like

That’s simple enough! Thanks.