Macro to sum over short loop

weymouth · July 5, 2020, 9:49pm

I find myself writing code like this repeatedly

function ke(I::CartesianIndex{d},u::Array{Float64}) where d
   s = 0.
   for i in 1:d
      s+=u[I,i]^2
   end
   return 0.5s
end

and wanted to give Julia’s macros a try. I was hoping to only need to write something like this

ke(I::CartesianIndex{d},u::Array{Float64}) where d = 0.5@sum(d, i -> u[I,i]^2)

I tried a few things, but I can’t get the dimension d to work generally and when I got it sort of working (hard coding the dimension), the resulting code is only half as fast as the for-loop. Is there some way to just generate the for-loop sum automatically?

Note that I’m not using the Base sum function

ke(I::CartesianIndex{d},u::Array{Float64}) where d = 0.5sum(i -> u[I,i]^2,1:d)

since it is around 10 times slower than the for loop.

ffevotte · July 5, 2020, 10:36pm

Would Base.sum on a generator expression work for you?

ke(I::CartesianIndex{D}, u::Array{Float64}) where {D} = 0.5sum(u[I,i]^2 for i in 1:D)

I did not perform extensive benchmarks, but at first sight it looks like this is as fast as your original version

dpsanders · July 5, 2020, 10:37pm

I think you can just do

0.5 * sum(x -> x^2, @view u[I, :])

ffevotte · July 5, 2020, 10:52pm

Also note that you’ve forgotten to elide bound checks. The two versions should probably look like this:

function ke1(I::CartesianIndex{d},u::Array{Float64}) where d
   s = 0.
   @inbounds for i in 1:d
      s+=u[I,i]^2
   end
   return 0.5s
end

ke2(I::CartesianIndex{D}, u::Array{Float64}) where {D} =
    0.5sum(@inbounds(u[I,i]^2) for i in 1:D)

weymouth · July 5, 2020, 11:13pm

That’s neat, but most of the functions are more complex than this, with multiple arrays and adjustments made to I. Also, I just wanted to try out macros…

WschW · July 6, 2020, 12:21am

The reason that d is not working generally is because the value of d is determine at type inference time which occurs after macro expand time. The way to get around this is with generated functions:

https://docs.julialang.org/en/v1/manual/metaprogramming/#Generated-functions-1

weymouth · July 6, 2020, 7:33am

This works and is exactly as fast as the for-loop. Why is this generator so much faster than the sum(fun,1:d)?

weymouth · July 6, 2020, 7:34am

Sorry for over-simplifying. I have those in the real code.

ffevotte · July 6, 2020, 8:40am

It looks like there still remains a function call in sum(fun, 1:d). I’m not sure why everything doesn’t get inlined.

If you define your own simple higher-order summation function (in much the same way as your original implementation, simply abstracting away the body of the loop in a function argument), you can also make everything work fast:

function simple_sum(fun, range)
    s = 0.
    for i in range
        s += fun(i)
    end
    s
end

ke3(I::CartesianIndex{D}, u) where {D} =
    0.5simple_sum(i->@inbounds(u[I,i]^2), 1:D)

In any case, I think higher-order functions are the way to go here, not macros. The performance of these relies, much like in your original implementation, on the compiler being able to unroll (small) loops when the range is known at compile time (i.e. extracted from a type parameter and constant-propagated to the for loop itself). And I would say that it is one of the best features of Julia that method specialization and constant propagation combine so well with optimization techniques featured by LLVM, so that we can have the unrolled loops without having to explicitly resort to metaprogramming techniques.

Now, you are right that, for the sake of learning metaprogramming techniques, unrolling loops is an interesting use case. (I think the first experiments I did with macros in Julia were along those lines too). For the sake of the example, here are a few things one might say about this in your particular case.

With macros, you would not rely on the compiler unrolling the loop. Instead, you’d directly generate a series of loop body blocks (one for each iteration of the loop). Again, this is probably not the best way to go here, but if you do need macros in other similar cases, try and take a look at the macros defined in Base.Cartesian: @ntuple, @nexprs and their friends can be very powerful. A very simple example would be:

julia> @macroexpand Base.Cartesian.@nexprs 2 i -> s += u[I, i]
quote
    s += u[I, 1]
    s += u[I, 2]
end

But, as already said by @WschW, you’d need the dimension parameter to be known at macro expansion time. If you still want to extract this parameter from a parameterized type, then you could use @generated functions, so that your original implementation could be rewritten with an explicitly unrolled loop like so:

@generated function ke4(I::CartesianIndex{D}, u) where {D}
    quote
        s = 0.
        Base.Cartesian.@nexprs $D i -> s += @inbounds(u[I, i])
        0.5s
    end
end

This could be combined, like before, with a higher-order function that helps you abstract away the loop body:

@generated function unrolled_sum(fun, ::Val{D}) where {D}
    quote
        s = 0.
        Base.Cartesian.@nexprs $D i -> s += fun(i)
    end
end

ke5(I::CartesianIndex{D}, u) where {D} =
    0.5unrolled_sum(i->@inbounds(u[I,i]^2), Val(D))

weymouth · July 6, 2020, 9:23am

Thanks. I appreciate the walkthrough even though it doesn’t seem to be needed for this case.