Why is there a performance penalty in evaluating the axes of an OffsetArray?

jishnub · October 17, 2020, 1:34pm

Currently evaluating the axes of an OffsetArray is considerably slower than that of an Array as the former uses a custom axis type that wraps the axes of the parent. I am trying to understand how to improve the performance of this operation.

To give an example (using the master branch of OffsetArrays.jl):

julia> X = rand(4, 4, 4, 4, 4, 4);

julia> XO = OffsetArray(X, -1, -2, -3, 1, 2, 3);

julia> @btime axes($X);
  4.286 ns (0 allocations: 0 bytes)

julia> @btime axes($XO);
  9.056 ns (0 allocations: 0 bytes)

Axes in either case are constructed using a map, and the performance of the map is identical as expected.

# Axes for an Array
julia> @btime map(Base.OneTo, size($X));
  4.288 ns (0 allocations: 0 bytes)

# Axes for an OffsetArray
julia> @btime map(OffsetArrays.IdOffsetRange, axes(parent($XO)), $XO.offsets);
  9.310 ns (0 allocations: 0 bytes)

The performance penalty appears to arise in constructing the type, and not in evaluating the axes of the parent. We may check this as

julia> axOp = axes(parent(XO));

julia> @btime map(OffsetArrays.IdOffsetRange, $axOp, $XO.offsets);
  9.036 ns (0 allocations: 0 bytes)

We check the performance of the constructors:

julia> @btime OffsetArrays.IdOffsetRange($(Ref(Base.OneTo(4)))[], $(Ref(0))[]);
  3.217 ns (0 allocations: 0 bytes)

julia> @btime Base.OneTo($(Ref(4))[]);
  2.717 ns (0 allocations: 0 bytes)

# Constructing tuples of these types
julia> @btime ntuple(x->$(Ref(Base.OneTo(4)))[], 6);
  3.221 ns (0 allocations: 0 bytes)

julia> @btime ntuple(x->$(Ref(OffsetArrays.IdOffsetRange(Base.OneTo(4),0)))[], 6);
  4.261 ns (0 allocations: 0 bytes)

I’m not sure if there’s much difference here. Why is the map so slow, and how to improve the performance of this operation?

Tamas_Papp · October 18, 2020, 11:02am

Please try a more involved benchmark, I am not sure that timings on the order of nanoseconds are very meaningful.

tim.holy · October 18, 2020, 1:01pm

Yep. LICM generally makes this kind of overhead irrelevant for any real-world example.

https://compileroptimizations.com/category/hoisting.htm

Raf · October 18, 2020, 2:28pm

There seems to be some overhead to doing writes to random locations.

Also a few places in DynamicGrids.jl gave me small performance improvements indexing into the parent instead of the OffsetArray.

tim.holy · October 18, 2020, 2:37pm

Would be good to document those, as they may be fixable with an @inline. Hoisting requires a place to hoist to (i.e., a caller that accesses the array multiple times), and it has to be in the same compiled blob.

Raf · October 18, 2020, 3:01pm

Sure I’ll look into it next time I’m working on that.

Topic		Replies	Views
Performance of OffsetArrays General Usage	9	2197	July 12, 2018
Why is there a performance hit on broadcasting with OffsetArrays? Performance question	3	1030	December 15, 2019
Offset array package with offsets in the type? General Usage	12	660	May 27, 2022
OffsetArrays vs Arrays speed New to Julia	4	396	June 17, 2023
Undef array with customized index General Usage	17	489	May 3, 2021

Why is there a performance penalty in evaluating the axes of an OffsetArray?

Related topics