You could write a @simd_inner_loop macro expands nested macros and then walks through many nested for loops and simply annotates the innermost one. Then you could write @simd_inner_loop @nloops 2 i A ….
I wanted something similar a while back.
The following is somewhat convoluted but it did work for me.
Idea was to use @nloops for the first N-1 loops, then explicitly code the innermost loop.
@nloops 2 (d->i_{d+1}) (d->1:size(A,d+1)) begin
@simd for i_1 = 1:size(A,1)
@inbounds s += @nref 3 A i
end
end
This also required loosening the type constraint on the internal function Base.Cartesian._nloops to allow anonymous functions for the loop variable names (rather than limit to just Symbol).
- function Base.Cartesian._nloops(N::Int, itersym::Symbol, rangeexpr::Expr, args::Expr...) + function Base.Cartesian._nloops(N::Int, itersym::Expr, rangeexpr::Expr, args::Expr...)
Allowing anonymous functions for the loop variable might be useful in its own right.