While the individual indices have a small size (usually 2-8), the dimension of the array/tensor (number of indices) is quite large - up to 200. For dimensions >32 I run into some inference limits that lead to massive allocations, maybe related to this: https://github.com/JuliaLang/julia/pull/27398 ?
This seems to affect every part of the AbstractArray interface, i.e. inference of CartesianIndices
Chuckling because I didn’t bother to check you were doing lazy arrays, so I immediately tried to make a 33-axes Array and hit an OutOfMemoryError. Limiting every axis to length 1 works though, and I can corroborate what you’re seeing. getindex is stable if the indices arguments are a splatted sequence of integers getindex(A, fill(1, 33)...) (the splatting itself allocates though), but not if it’s a CartesianIndex with length > 32, specifically getindex(A, first(CartesianIndices(A)) ). With A::Array, I’m seeing the compiler give up on inferring the concrete integer tuple type returned by Base.to_indices, just Tuple.
The issue seems specifically with a map over a long Tuple, whereas a broadcast operation is type-inferred. Using a broadcast in CartesianIndices construction and indexing may improve the inference in this case (although I’m unsure if that’s a good idea in general).
I was going to respond “surely not, an SVector is just an NTuple”, but as it turns out…
julia> function f2(x)
return map(i -> i + 1, x)
end
f2 (generic function with 1 method)
julia> @code_warntype f2(ntuple(_ -> 0, Val(64)))
MethodInstance for f2(::NTuple{64, Int64})
from f2(x) @ Main REPL[64]:1
Arguments
#self#::Core.Const(f2)
x::NTuple{64, Int64}
Locals
#13::var"#13#14"
Body::Tuple
1 ─ (#13 = %new(Main.:(var"#13#14")))
│ %2 = #13::Core.Const(var"#13#14"())
│ %3 = Main.map(%2, x)::Tuple
└── return %3
julia> @code_warntype f2(@SVector(zeros(Int, 64)))
MethodInstance for f2(::SVector{64, Int64})
from f2(x) @ Main REPL[64]:1
Arguments
#self#::Core.Const(f2)
x::SVector{64, Int64}
Locals
#13::var"#13#14"
Body::SVector{64, Int64}
1 ─ (#13 = %new(Main.:(var"#13#14")))
│ %2 = #13::Core.Const(var"#13#14"())
│ %3 = Main.map(%2, x)::SVector{64, Int64}
└── return %3
Apparently map works better for SVector than for NTuple, but SVectors are dispatch as Vectors so I can’t use them for CartesianIndices.
BUT:
I read up on generated functions and this lets me circumvent the splatting limit in ind2sub for linear indexes:
@generated function ind2sub(t::NTuple{N,Int}, i::Int) where {N}
ex = :(i -= 1)
for j in 1:N
Mj = Base.Cartesian.inlineanonymous(:M, j)
ex = quote
$ex
$Mj = mod(i, t[$j]) + 1
i = i ÷ t[$j]
end
end
Mjs = [Base.Cartesian.inlineanonymous(:M, j) for j in 1:N]
ex = quote
$ex
return $(Expr(:tuple, Mjs...))
end
return ex
end
This manually unrolls ind2sub and is reasonably fast! I’m not sure how much faster/slower it is than cartesian indexing though.
The map methods in tuple.jl aggressively inlines the result tuple (f(t[1]), f(t[2]), f(t[3])), ... up to 31 elements, beyond which there is a method for the alias Base.Any32 that just makes a vector and splats it into a tuple, which does not preserve the length. The reason is to “avoid code blowup”, presumably meaning compiling for every concrete Tuple type with progressively longer recursively unrolled loop code.
StaticArrays mapreduce.jl on the other hand has a generated function _map that fixes the size, but it is only dispatched to if any of the first two array arguments are StaticArrays. For example, map((a, b, i) -> i+1, zeros(Int, 64), zeros(Int, 64), @SVector(zeros(Int, 64))) does Base.map and makes a Vector.
I wonder if all this work wouldn’t be needed if we could make an undefNTuple and initialize it in a loop that doesn’t get unrolled, exposed to us as an unsafe macro block that disallows assignments to other variables so there’s no capability for actual mutability. Or perhaps we can use Accessors.jl to semantically construct new tuples with more initialized elements, but the compiler turns that into a loop over 1 sequence.