I’ve been experimenting with the techniques described here Multidimensional algorithms and iteration for writing some generic multi-dimensional operations.
Let’s look at a nan-skipping sum:
function nansum!(B, A)
# It's assumed that B has size 1 along any dimension that we're summing,
# and otherwise matches A
fill!(B, NaN)
Bmax = last(CartesianIndices(B))
@inbounds for I in CartesianIndices(A)
dest = min(Bmax, I)
ai = A[I]
bi = B[dest]
B[dest] = ifelse(isnan(ai), bi, ifelse(isnan(bi), ai, bi + ai))
end
return B
end
julia> A = rand([NaN, 3.4, 4.5], 100, 40, 60);
julia> B = Array{Float64}(undef, 1, 40, 60);
julia> @btime nansum!($B, $a);
788.930 μs (0 allocations: 0 bytes)
This works, and handles arbitrary dimensions. Cool!
nansum2!
below is hard-coded for the case of 3-dimensions & dims=1, i.e. reducing only the first dimension:
function nansum2!(B::AbstractArray{T,3}, A::AbstractArray{T,3}) where T
fill!(B, NaN)
@inbounds for k in 1:size(A, 3)
for j in 1:size(A, 2)
for i in 1:size(A, 1)
ai = A[i, j, k]
bi = B[1, j, k]
B[1, j, k] = ifelse(isnan(ai), bi, ifelse(isnan(bi), ai, bi + ai))
end
end
end
return B
end
julia> @btime nansum2!($B, $a);
450.433 μs (0 allocations: 0 bytes)
The performance is significantly better, which is a shame. Seems like the fact that the generic solution doesn’t know about which dims are size 1 at compile-time does have a cost (has to go through the min
every iteration).
Does anyone have suggestions for how to keep the simplicity and genericity of the first version, but with genuinely no overhead? It’ll have to to dispatch / recompile based on the reduction dims - just not sure what the nicest way to code that is.