@Tamas_Papp This example requires PaddedMatrices’ master branch, but I should make a new release within the next few days.
It doesn’t support Dual
numbers yet, so you may also want to either request that I add support for that, or take the idea for your own implementation of this approach.
Example:
julia> using PaddedMatrices, VectorizationBase, ArrayInterface
julia> using VectorizationBase: StridedPointer
julia> @noinline function foo(A)
A[1] = 1 / A[1]
A[1] + A[2]
end
foo (generic function with 1 method)
julia> function make_stride_dynamic(p::StridedPointer{T,N,C,B,R}) where {T,N,C,B,R}
StridedPointer{T,N,C,B,R}(p.p, map(Int, p.strd), p.offsets)
end
make_stride_dynamic (generic function with 1 method)
julia> function make_dynamic(A::PtrArray)
PtrArray(make_stride_dynamic(stridedpointer(A)), Base.size(A), ArrayInterface.dense_dims(A))
end
make_dynamic (generic function with 1 method)
julia> function make_dynamic(A::StrideArray)
StrideArray(make_dynamic(PtrArray(A)), A.data)
end
make_dynamic (generic function with 2 methods)
julia> function bar(M, N, iter = 100)
S = @StrideArray rand(M, N)
D = make_dynamic(S)
s = 0.0
for i ∈ 1:iter
# macro currently doesn't look through nested expressions
s += @gc_preserve foo(D)
end
s
end
bar (generic function with 2 methods)
julia> @allocated bar(StaticInt(1), StaticInt(4), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(4), 100)
0
julia> @allocated bar(StaticInt(3), StaticInt(4), 100)
0
julia> @allocated bar(StaticInt(4), StaticInt(4), 100)
0
julia> @allocated bar(StaticInt(5), StaticInt(4), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(5), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(6), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(7), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(8), 100)
0
julia> @allocated bar(StaticInt(2), StaticInt(9), 100)
0
julia> using MethodAnalysis
julia> mis = methodinstances(foo)
1-element Vector{Core.MethodInstance}:
MethodInstance for foo(::PtrArray{Tuple{Int64, Int64}, (true, true), Float64, 2, 1, 0, (1, 2), Tuple{Int64, Int64}, Tuple{StaticInt{1}, StaticInt{1}}})
A rundown of what bar
is doing:
- Creates a random array. If the dimensions are
StaticInt
s, this array is statically sized and will (given certain conditions) be stack allocated.
- It makes the array dynamic, by discarding the static size and and stride information. The static tuple information remains.
- Repeatedly calls
foo
. foo
is not inlined (@noinline
), and foo
also mutates the array.
However, the @gc_preserve
macro (which is still primitive, it doesn’t actually descend/walk expressions…) GC.@preserve
s the memory backing the array and sheds the last bit of static tuple size info, before passing the array to foo
.
Thus, @allocated
shows 0
.
After trying a variety of different sizes, I call methodinstances
to confirm that only a single foo
method has been compiled, because the arrays are indeed dynamically sized.