Currently evaluating the axes
of an OffsetArray
is considerably slower than that of an Array
as the former uses a custom axis type that wraps the axes of the parent. I am trying to understand how to improve the performance of this operation.
To give an example (using the master branch of OffsetArrays.jl
):
julia> X = rand(4, 4, 4, 4, 4, 4);
julia> XO = OffsetArray(X, -1, -2, -3, 1, 2, 3);
julia> @btime axes($X);
4.286 ns (0 allocations: 0 bytes)
julia> @btime axes($XO);
9.056 ns (0 allocations: 0 bytes)
Axes in either case are constructed using a map
, and the performance of the map
is identical as expected.
# Axes for an Array
julia> @btime map(Base.OneTo, size($X));
4.288 ns (0 allocations: 0 bytes)
# Axes for an OffsetArray
julia> @btime map(OffsetArrays.IdOffsetRange, axes(parent($XO)), $XO.offsets);
9.310 ns (0 allocations: 0 bytes)
The performance penalty appears to arise in constructing the type, and not in evaluating the axes of the parent. We may check this as
julia> axOp = axes(parent(XO));
julia> @btime map(OffsetArrays.IdOffsetRange, $axOp, $XO.offsets);
9.036 ns (0 allocations: 0 bytes)
We check the performance of the constructors:
julia> @btime OffsetArrays.IdOffsetRange($(Ref(Base.OneTo(4)))[], $(Ref(0))[]);
3.217 ns (0 allocations: 0 bytes)
julia> @btime Base.OneTo($(Ref(4))[]);
2.717 ns (0 allocations: 0 bytes)
# Constructing tuples of these types
julia> @btime ntuple(x->$(Ref(Base.OneTo(4)))[], 6);
3.221 ns (0 allocations: 0 bytes)
julia> @btime ntuple(x->$(Ref(OffsetArrays.IdOffsetRange(Base.OneTo(4),0)))[], 6);
4.261 ns (0 allocations: 0 bytes)
I’m not sure if there’s much difference here. Why is the map so slow, and how to improve the performance of this operation?