Loop unrolling, type param to macro, generated functions

pint · February 5, 2017, 12:02am

yeah, i don’t seem to know what my question is. below you find 3.

i have a type Mod{B, A} which just encapsulates an UInt64 array.

and a naive function on it:

@inline limbs(b) = cld(b, 31)

mul!{B, A}(r::Mod{B, A}, a::Mod{B, A}, b::Mod{B, A}) = begin
  @inbounds begin
    for i in 1:limbs(B)
      r.n[i] = zero(UInt64)
    end
    for i in 1:limbs(B), j in 1:limbs(B)      
      r.n[i] += a.n[mod(i-j, limbs(B)) + 1] * b.n[j] * (j > i ? UInt64(A) : UInt64(1))
    end
  end
  r
end

the idea here is that every loop bound is known at compile time, the loops would get unrolled, and so the mod and the conditional does not even get into the native code. however, it seems that i exceeded some unroll threshold, and i get only the inner loop unrolled. the mod hurts especially badly, i tried using an optimized mod, which is good, but unrolling would be even better. so question #1: is there a way to force unrolling?

i tried to use @unroll as in Unroll.jl, or more precisely a customized variant of it. however, it appears that macros are expanded before B is known, so not the value but just a symbol is passed. dead end? question #2: do i give up on macros for this?

i wound up doing a generated function. it makes super optimized code, and the speed significantly improved. however, i seem to recall that generated functions have some problems, maybe with precompiling, but some cursory googling did not help. so question #3: are there any drawbacks to generated functions?

below my creation. a little bit write only, but blazing fast.

@generated mul!{B, A}(r::Mod{B, A}, a::Mod{B, A}, b::Mod{B, A}) = begin
  li = limbs(B)
  r = esc(r); a = esc(a); b = esc(b)
  quote
    @inbounds begin
      $([quote
          t = UInt64(0)
          $([ :( t += a.n[$(mod(i-j, li)+1)] * b.n[$j] ) for j = i+1:li]...)
          t *= $(UInt64(A))
          $([ :( t += a.n[$(mod(i-j, li)+1)] * b.n[$j] ) for j = 1:i]...)
          r.n[$i] = t
        end for i = 1:li ]...)
    end
    r
  end
end

stevengj · February 5, 2017, 3:16am

ntuple with a Val argument will be completely unrolled at compile time. See Jeff’s trick with the circularshift! function at: https://github.com/stevengj/18S096-iap17/blob/master/lecture3/Types%20and%20Dispatch.ipynb

pint · February 5, 2017, 9:06am

it appears to me that the same threshold applies to this case. see

@code_native circularshiftN!(ones(100), Val{50}())

the loop is not unrolled anymore. i think it gets more eagerly unrolled because the loop body is very simple. there must be some limitation on the “total number of things after unrolling”, possibly similar to the inlining logic. however, this limit is maybe applied before the massive optimization of llvm can take place?

Orbots · April 19, 2017, 11:52pm

A trick I recently used for loop unrolling with macros is to use nested macros.

macro unrollit( n, body )
...
end

macro dowithunroll(n)
  quote
    ...   
    @unrollit( n,  <stuff to unroll> ) 
    ....
  end
end

function nknown()
   withunroll(31)
end

The top level function calls a macro with a constant known at parse-time. Then that macro can call a more generic unrolling macro with that constant. In my case dowithunroll was a function that I converted to a macro and block quoted.

Would be cleaner if I could keep dowithunroll as a function, since I’m only converting to a macro so I can pass this parse-time constant down to the generic unrolling macro.

cstjean · April 20, 2017, 2:05am

I wrote a macro that expands into the generated function: Unrolled.jl (not registered). Your example doesn’t loop over sequences, but over 1:N, so either look at the source and adapt it, or you could implement a CompileTimeUnitRange{A, B}() type and use a helper function. Something like:

mul!{B, A}(r::Mod{B, A}, a::Mod{B, A}, b::Mod{B, A}) = mul!_helper(r, a, b, CompileTimeUnitRange{1, limbs(B)}(), A)
@unroll function mul!_helper(r, a, b, limbs, A)
  @inbounds begin
      @unroll for i in limbs
        r.n[i] = zero(UInt64)
      end
      @unroll for i in limbs
        @unroll for j in limbs    
          r.n[i] += a.n[mod(i-j, length(limbs)) + 1] * b.n[j] * (j > i ? UInt64(A) : UInt64(1))
        end
      end
    end
  r
end

Topic		Replies	Views
A macro to unroll by hand but not by hand? Performance macros , metaprogramming	24	2863	January 19, 2022
Optimisations for loops of known size at compile time Internals & Design question , unrolling	3	605	March 25, 2022
Unrolling loops over tuples - why so hard? Performance tuple , unrolling	14	1366	September 10, 2023
Loop unrolling for type stability New to Julia question	6	403	December 12, 2022
Efficient iteration with fixed size array types without `@generated` functions Performance	9	206	September 26, 2024

Loop unrolling, type param to macro, generated functions

Related topics