Is there a `CartesianIndex` using `Int32`?

The definition in Julia base/multidimensional.jl is restricted to Int64

  struct CartesianIndex{N} <: AbstractCartesianIndex{N}
      I::NTuple{N,Int}
      CartesianIndex{N}(index::NTuple{N,Integer}) where {N} = new(index)
  end

Of course, one can quickly implement (a subset of the functionality) manually for Int32, but I was wondering why it is restricted to Int64 in the first place. Or, if there exists already the multidimensional indexing stuff somewhere in another package?

It is using Int which defaults to Int32 on 32 bit systems and Int64 on 64 bit systems.
Which is generally the most reasonable thing to do, since at the end of the day you will need to end up with a pointer that matches the pointer size of the system’s hardware.

Why do you not want this?

2 Likes

Mh, I am working on GPU programming for which the system is 64 bits, but the kernels should avoid using 64-bit integers as they are executed on the GPU. That’s why Int in this context is Int64.

To be concrete, it is this code: SpatialHashTables.jl/src/core.jl at 25bdc1c97c85dfad9255a9b281829fcfe1d2d48e · SteffenPL/SpatialHashTables.jl · GitHub
which is called in a setting like the following:
SpatialHashTables.jl/benchmarks/report/forcebenchmark.jl at 25bdc1c97c85dfad9255a9b281829fcfe1d2d48e · SteffenPL/SpatialHashTables.jl · GitHub

I found with benchmarks that the use of Int64 is indeed the bottleneck here. That’s why I’m looking into this.

Is the GPU architecture 32 bit? In theory, if Int were truly architecture informed, it should be an Int32 there. That’s currently not possible though, since Int is hardcoded on the host side…

2 Likes

I’m using KernelAbstractions.jl, all of that goes through the GPUCompiler.jl, but there it requires explicit annotation to enforce Int32. I found for this discussion which seems related:

1 Like

Doesn’t to_indices always take care of the conversion during indexing? Seems possible to wrangle any Integer before then.

Yes, that’s the GPU specific discussion about much the same problem. The core issue is that the size of Int is determined by the parser on the host system, not the target architecture.

In principle yes, but the problem is that the conversion might throw if the integer is larger than typemax(Int32). Because CartesianIndex mandates Int64 due to the host system being 64-bit, the error check can’t be removed since it might (from the POV of the type system) have a runtime value that encounters the error path.

I’m assuming the integer wrangling occurs in smaller or equal types (values that are subsets of the values of the native integer type), seems like OP wants to leverage this for speed.

There was an old package about switching what types literals do, seems at least half-relevant here.

1 Like

yes, let me know if I should do a minimal example, but essentially I need to iterate though a small CartesianIndex(starts:ends) on the GPU and apply the mod1 function to it. I found that a innocent index + 1 here and there already killed performance, and now I am replacing the Catesian indices altogether. (Which seems to work, but makes the code look uglier.)

The mod1 is the core function which seems to mess up performance if one uses Int64 instead of Int32.