It looks like there still remains a function call in sum(fun, 1:d)
. I’m not sure why everything doesn’t get inlined.
If you define your own simple higher-order summation function (in much the same way as your original implementation, simply abstracting away the body of the loop in a function argument), you can also make everything work fast:
function simple_sum(fun, range)
s = 0.
for i in range
s += fun(i)
end
s
end
ke3(I::CartesianIndex{D}, u) where {D} =
0.5simple_sum(i->@inbounds(u[I,i]^2), 1:D)
In any case, I think higher-order functions are the way to go here, not macros. The performance of these relies, much like in your original implementation, on the compiler being able to unroll (small) loops when the range is known at compile time (i.e. extracted from a type parameter and constant-propagated to the for loop itself). And I would say that it is one of the best features of Julia that method specialization and constant propagation combine so well with optimization techniques featured by LLVM, so that we can have the unrolled loops without having to explicitly resort to metaprogramming techniques.
Now, you are right that, for the sake of learning metaprogramming techniques, unrolling loops is an interesting use case. (I think the first experiments I did with macros in Julia were along those lines too). For the sake of the example, here are a few things one might say about this in your particular case.
With macros, you would not rely on the compiler unrolling the loop. Instead, you’d directly generate a series of loop body blocks (one for each iteration of the loop). Again, this is probably not the best way to go here, but if you do need macros in other similar cases, try and take a look at the macros defined in Base.Cartesian: @ntuple
, @nexprs
and their friends can be very powerful. A very simple example would be:
julia> @macroexpand Base.Cartesian.@nexprs 2 i -> s += u[I, i]
quote
s += u[I, 1]
s += u[I, 2]
end
But, as already said by @WschW, you’d need the dimension parameter to be known at macro expansion time. If you still want to extract this parameter from a parameterized type, then you could use @generated
functions, so that your original implementation could be rewritten with an explicitly unrolled loop like so:
@generated function ke4(I::CartesianIndex{D}, u) where {D}
quote
s = 0.
Base.Cartesian.@nexprs $D i -> s += @inbounds(u[I, i])
0.5s
end
end
This could be combined, like before, with a higher-order function that helps you abstract away the loop body:
@generated function unrolled_sum(fun, ::Val{D}) where {D}
quote
s = 0.
Base.Cartesian.@nexprs $D i -> s += fun(i)
end
end
ke5(I::CartesianIndex{D}, u) where {D} =
0.5unrolled_sum(i->@inbounds(u[I,i]^2), Val(D))