Base.Cartesian's @nloops with @simd

The first example on how to use @nloops in the Base.Cartesian documentation is the following:

using Base.Cartesian

A = rand(3,3)
s = 0.0
@nloops 2 i A begin
	s += @nref 2 A i
end

The loop is expanded to

for i_2 = 1:size(A,2)
	for i_1 = 1:size(A,1)
		s += A[i_1, i_2]
	end
end

I would like to add a @simd macro to the inner loop as explained in performance section of the manual, such that the above @nloops is expanded to:

for i_2 = 1:size(A,2)
	@inbounds @simd for i_1 = 1:size(A,1)
		s += A[i_1, i_2]
	end
end

Is there a way to achieve this? Simply adding @inbounds @simd in front of the @nloops does not work.

I don’t think there is a direct way of doing this, but you can take a look at implementation of @nloops in
https://github.com/JuliaLang/julia/blob/d8a57182f49787055b3a864cc86401988be4bcdf/base/cartesian.jl#L47-L70
and add @simd for dim == 1.

It would be nice to just use the Cartesian macros, but this is probably the way to go. Thanks!

You could write a @simd_inner_loop macro expands nested macros and then walks through many nested for loops and simply annotates the innermost one. Then you could write @simd_inner_loop @nloops 2 i A ….

Very interesting idea! My macro skills would have to evolve first, I think…

I wanted something similar a while back.
The following is somewhat convoluted but it did work for me.
Idea was to use @nloops for the first N-1 loops, then explicitly code the innermost loop.

@nloops 2 (d->i_{d+1}) (d->1:size(A,d+1)) begin
    @simd for i_1 = 1:size(A,1)
        @inbounds s += @nref 3 A i
    end
end

This also required loosening the type constraint on the internal function Base.Cartesian._nloops to allow anonymous functions for the loop variable names (rather than limit to just Symbol).

- function Base.Cartesian._nloops(N::Int, itersym::Symbol, rangeexpr::Expr, args::Expr...)
+ function Base.Cartesian._nloops(N::Int, itersym::Expr, rangeexpr::Expr, args::Expr...)

Allowing anonymous functions for the loop variable might be useful in its own right.

1 Like

This looks like a quick way to solve the OP – thanks!