Cheapest way for creating a CartesianIndex

Hi,

I am trying to write a generic multidimensional algorithm where I need a random access to an ND Array.
I need to define a CartesianIndex with computed coordinates and modify my array using this CI.

Unfortunately it seems that this process allocates some memory:

@benchmark CartesianIndex(Tuple([1,2,3]))

returns

BenchmarkTools.Trial: 10000 samples with 641 evaluations.
 Range (min … max):  193.058 ns …   4.688 μs  ┊ GC (min … max): 0.00% … 94.70%
 Time  (median):     194.683 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   204.164 ns ± 187.748 ns  ┊ GC (mean ± σ):  4.08% ±  4.23%

   ▂▆██▇▅▄▂▁▁  ▁▁▃▃▃▂▁ ▁ ▁▁▁▂▁                                  ▂
  ▅█████████████████████████████▆▇▆▇▆▇▆▆▅▇▅▆▆▅▄▅▄▅▄▃▅▄▄▅▄▄▃▄▄▁▅ █
  193 ns        Histogram: log(frequency) by time        214 ns <

 Memory estimate: 144 bytes, allocs estimate: 3.

I wonder if it is possible to improve the situation without falling back to linear indices…
Any hint ?

You’re allocating in your benchmark because you’re creating an array with [1,2,3]. Use the multi-argument constructor instead.

Can you show an example in your algorithm?

1 Like

True but it is not the only allocation:

julia> a=[1,2,3]

3-element Vector{Int64}:
 1
 2
 3

julia> @benchmark CartesianIndex(Tuple($a))
BenchmarkTools.Trial: 10000 samples with 705 evaluations.
 Range (min … max):  179.196 ns …  4.248 μs  ┊ GC (min … max): 0.00% … 94.41%
 Time  (median):     180.496 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   184.547 ns ± 90.713 ns  ┊ GC (mean ± σ):  1.44% ±  2.79%

   ▂▇██▆▃▁▁▁ ▁▂▂▂▁▃▃▁  ▂▂▁▁ ▁                                  ▂
  ▅██████████████████▇█████▇███████▇██▇▇▆▆▆▇▇▇▆▇▆▅▇▆▆▅▄▅▄▅▅▄▅▆ █
  179 ns        Histogram: log(frequency) by time       197 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

The algorithm looks like:

function my_algo(ndarray::Array{Int,D}) where {D}
   temp_idx_array=zero(Int,D)
   for ...
       myupdate!(temp_idx_array)
       ndarray(CartesianIndex(Tuple(temp_idx_array))=something
   end

I could compute the corresponding linear index to ndarray but the rest of the algorithmis based on CartesianIndices.

Don’t know, if this helps: the length of the vector is not known at compile time, therefore

julia> @benchmark CartesianIndex(a...) setup=(a=[1, 2, 3])
BenchmarkTools.Trial: 10000 samples with 308 evaluations.
 Range (min … max):  237.338 ns …   8.984 μs  ┊ GC (min … max): 0.00% … 95.64%
 Time  (median):     273.701 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   275.756 ns ± 169.897 ns  ┊ GC (mean ± σ):  1.19% ±  1.92%

  █           █
  █▅▁▁▅▄▄▄▂▁▂▂█▆▂▂▂█▄▃▂▂▂▂▂▃▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  237 ns           Histogram: frequency by time          414 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark CartesianIndex(a...) setup=(a=(1, 2, 3))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.100 ns … 43.900 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.300 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.356 ns ±  1.088 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     █
  ▂▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂ ▂
  1.1 ns         Histogram: frequency by time         1.7 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
2 Likes

Thanks !
The length of the vector is known when the algorithm is applied to the ND array so I could use a parametric algorithm like in my pseudo_example. The problem is then to create the input Tuple without relying on multi-argument Ctor.

The linear index solution is probably much simpler in my case…

Depending on the maximum number of dimensions, using StaticArrays can speed it up:

using StaticArrays
x = SA[1,2,4]
CartesianIndex(Tuple(x))    # 2 ns (0 allocs: 0 bytes)
1 Like

You can just create a zero(CartesianIndex{D}):

julia> zero(CartesianIndex{4})
CartesianIndex(0, 0, 0, 0)

and then add/subtract/multiply:

julia> zero(CartesianIndex{4}) + one(CartesianIndex{4})*2
CartesianIndex(2, 2, 2, 2)

Since CartesianIndex are just fancy tuples & immutable, I usually create whole grids of them and filter the ones out I don’t care about.

The length of the vector is not known on the type level, which causes a type instability for the Tuple, which in turn causes a type instability on CartesianIndex (the dimensionality is part of the type there after all). These type instabilities end up boxing those variables, meaning they need to be heap allocated - 2 type instabilities plus one array equals 3 allocations :slight_smile:

2 Likes

You can make a tuple directly with (1,2,3), without creating an array first. i.e. do CartesianIndex((1,2,3)).

2 Likes

Sorry, I believe we can’t check your MWE due to omissions.

1 Like

Thank you very much for all answers not only providing a neat solution but allowing me to understand what was going on :wink:

explains nicely why I marked @rafael.guerra StaticArrays based proposal as the solution of my problem:

In my code, I can use the same mutable and fixed size MVector x which is modified inside the loop and compute corresponding CartesianIndex(Tuple(x)) for free to access my ND-Array.

Sorry for the pseudo-algo I provided which was not a MWE.

1 Like