Stack-allocated arrays without excessive compilation time

@Tamas_Papp This example requires PaddedMatrices’ master branch, but I should make a new release within the next few days.
It doesn’t support Dual numbers yet, so you may also want to either request that I add support for that, or take the idea for your own implementation of this approach.

Example:

julia> using PaddedMatrices, VectorizationBase, ArrayInterface

julia> using VectorizationBase: StridedPointer

julia> @noinline function foo(A)
           A[1] = 1 / A[1]
           A[1] + A[2]
       end
foo (generic function with 1 method)

julia> function make_stride_dynamic(p::StridedPointer{T,N,C,B,R}) where {T,N,C,B,R}
           StridedPointer{T,N,C,B,R}(p.p, map(Int, p.strd), p.offsets)
       end
make_stride_dynamic (generic function with 1 method)

julia> function make_dynamic(A::PtrArray)
           PtrArray(make_stride_dynamic(stridedpointer(A)), Base.size(A), ArrayInterface.dense_dims(A))
       end
make_dynamic (generic function with 1 method)

julia> function make_dynamic(A::StrideArray)
           StrideArray(make_dynamic(PtrArray(A)), A.data)
       end
make_dynamic (generic function with 2 methods)

julia> function bar(M, N, iter = 100)
           S = @StrideArray rand(M, N)
           D = make_dynamic(S)
           s = 0.0
           for i ∈ 1:iter
               # macro currently doesn't look through nested expressions
               s += @gc_preserve foo(D)
           end
           s
       end
bar (generic function with 2 methods)

julia> @allocated bar(StaticInt(1), StaticInt(4), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(4), 100)
0

julia> @allocated bar(StaticInt(3), StaticInt(4), 100)
0

julia> @allocated bar(StaticInt(4), StaticInt(4), 100)
0

julia> @allocated bar(StaticInt(5), StaticInt(4), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(5), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(6), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(7), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(8), 100)
0

julia> @allocated bar(StaticInt(2), StaticInt(9), 100)
0

julia> using MethodAnalysis

julia> mis = methodinstances(foo)
1-element Vector{Core.MethodInstance}:
 MethodInstance for foo(::PtrArray{Tuple{Int64, Int64}, (true, true), Float64, 2, 1, 0, (1, 2), Tuple{Int64, Int64}, Tuple{StaticInt{1}, StaticInt{1}}})

A rundown of what bar is doing:

  1. Creates a random array. If the dimensions are StaticInts, this array is statically sized and will (given certain conditions) be stack allocated.
  2. It makes the array dynamic, by discarding the static size and and stride information. The static tuple information remains.
  3. Repeatedly calls foo. foo is not inlined (@noinline), and foo also mutates the array.

However, the @gc_preserve macro (which is still primitive, it doesn’t actually descend/walk expressions…) GC.@preserves the memory backing the array and sheds the last bit of static tuple size info, before passing the array to foo.

Thus, @allocated shows 0.
After trying a variety of different sizes, I call methodinstances to confirm that only a single foo method has been compiled, because the arrays are indeed dynamically sized.

3 Likes