CartesianIndices performance: 3D faster than 2D?

lntricate · February 6, 2023, 12:06am

I have encountered a strange occurence when testing the performance of CartesianIndices:

using BenchmarkTools

function test(t::Tuple)
  for _ ∈ CartesianIndices(t)
  end
end

@btime test((100,)) # 3.094 ns
@btime test((100, 100)) # 9.531 μs
@btime test((100, 100, 100)) # 7.267 μs
@btime test((100, 100, 100, 100)) # 152.958 ms

There are 2 behaviors here I’m confused about:

The 3-dimensional loop takes less time than the 2-dimensional loop.
The 4-dimensional loop takes over 20,000 times longer than the 3-dimensional loop despite (theoretically) only having to iterate over 100x the indices. Why does it not take 100x longer?

Any help would be appreciated!

Paulo_Jabardo · February 6, 2023, 4:21pm

I don’t know what the compiler is doing but remember that your code is not doing anything and it might be optimized away.

For the first two cases, the for body is optimized away:

julia> @code_llvm test((100,))
;  @ REPL[2]:1 within `test`
define void @julia_test_568([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0) #0 {
top:
;  @ REPL[2]:3 within `test`
  ret void
}

julia> @code_llvm test((100,100))
;  @ REPL[2]:1 within `test`
define void @julia_test_574([2 x i64]* nocapture nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
;  @ REPL[2]:3 within `test`
  ret void
}

Now, for test(100,100,100), the code is much larger.

By the way, my timings are:

julia> @btime test((100,))
  2.194 ns (0 allocations: 0 bytes)

julia> @btime test((100,100))
  2.194 ns (0 allocations: 0 bytes)

julia> @btime test((100,100,100))
  3.444 μs (0 allocations: 0 bytes)

julia> @btime test((100,100,100,100))
  60.121 ms (0 allocations: 0 bytes)

Paulo

lntricate · February 6, 2023, 8:16pm

Thanks for the response! Indeed the compiler optimizing away the loop body explains all of these results. After testing the same function with @noinline and a simple addition in the loop body, I got the following:

@btime test((100,)) # 112.413 ns
@btime test((100, 100)) # 12.523 μs
@btime test((100, 100, 100)) # 1.094 ms
@btime test((100, 100, 100, 100)) # 139.926 ms

Each roughly 100 times more than the previous one as expected.

Topic		Replies	Views
Performance issues with CartesianIndices Performance question	5	599	February 24, 2022
Performance of Cartesian Indices and Views General Usage	0	384	February 18, 2020
Iteration with CartesianIndices(array) is slow when the dimension of the array is large Performance indexing	2	814	August 25, 2019
Poor performance with custom multidimensional iterator Performance	4	601	September 16, 2020
Overflow with Iterators.product General Usage question , bug , iterators	7	145	July 2, 2024

CartesianIndices performance: 3D faster than 2D?

Related topics