Functions of Functions Performance in a Loop

For the following two loops, in which a function of a function is called in one, and the computation is more explicit in the other, there is a significant amount of memory allocation in the former. It also takes a performance hit of a factor of about 5:

function Boltzmann(x,V)
    w = exp(-V(x));
    return w;
end

function test_loop1(n,V)
    avg = 0.
    for j in 1:n
        x = randn();
        L = Boltzmann(x,V);
        b = max(1, L)
        avg +=b/n;
    end
    
    return avg;
end

function test_loop2(n,V)
    avg = 0.
    for j in 1:n
        x = randn();
        L = exp(-V(x));
        b = max(1, L)
        avg +=b/n;
    end
    
    return avg;
end


function U(x)
   return 0.5 * x * x; 
end
Random.seed!(100)
@btime test_loop1(10^4,U)
  1.525 ms (60000 allocations: 937.50 KiB)

while

Random.seed!(100)
@btime test_loop2(10^4,U)
  278.115 μs (0 allocations: 0 bytes)

This is a fairly trivial example, and it’s not terrible that I would need to code out exp(-V(x)). There are, however, other problems where an intermediate computation is needed that is messier, and it would be nice to be able to encapsulate it in a function without taking such a hit. How should I understand this behavior? How can I improve upon it?

I would try

function test_loop3(n,V::F) where {F}
    avg = 0.
    for j in 1:n
        x = randn();
        L = Boltzmann(x,V);
        b = max(1, L)
        avg +=b/n;
    end
    
    return avg;
end

The compiler cannot always specialize on function arguments. AFAICT the heuristic is if the function is called in the body. With the above trick you force specialization anyway.

See Performance Tips · The Julia Language for the doc entry about this.