Use of CartesianIndices with CUDA?

Ferran_Mazzanti · June 27, 2020, 5:26pm

Hi,
I have a function that iterates over all elements of a four-dimenasional array full of CartesianIndices elements

21×31×41×51 Array{CartesianIndex{4},4}:
[:, :, 1, 1] =
 CartesianIndex(1, 1, 1, 1)   …  CartesianIndex(1, 31, 1, 1)
 CartesianIndex(2, 1, 1, 1)      CartesianIndex(2, 31, 1, 1)
 CartesianIndex(3, 1, 1, 1)      CartesianIndex(3, 31, 1, 1)
 CartesianIndex(4, 1, 1, 1)      CartesianIndex(4, 31, 1, 1)
 CartesianIndex(5, 1, 1, 1)      CartesianIndex(5, 31, 1, 1)
...

the fact is the problem itself the function does is 100% parallelizable, as it has to do some independent operations for each element of the array, and do the product at then (not really the product, but similar stuff). That seems to be perfect for CUDA as there are many elements to process, each one independent of the rest.

Now the question is: can I directly work with these CartesianIndex elements in CUDA? Is this implemented? Or should I convert that to a 4-dimensional array amd work with that? In case I shall convert to an array, how do you properly do a nested for loop (one for each dimension of the CartesianIndex, (and here I have 4), taking full advantage of the CUDA parallel capabilities?

Thanks a lot,

Ferran.

dpsanders · June 27, 2020, 9:33pm

You can use CartesianIndex objects on the GPU – did you try it? But the real answer to your question depends on how you’re going to use them.

EDIT: Which you will need to provide more information about to be able to give a useful anwer.

Ferran_Mazzanti · June 28, 2020, 7:36am

for instance, this throws an error on my machine

using CuArrays
using CuArrays.CURAND
using CUDAnative
using CUDAdrv

# include the path to user-defined modules
# 
#push!(LOAD_PATH,homedir()*"/Julia_1/Modules");
#push!(LOAD_PATH,homedir()*"/Julia_1/Modules/RBM");
#push!(LOAD_PATH,homedir()*"/Julia_1/Modules/CUDA");

A = rand(3,4,5)

aux_CI = CartesianIndices(A)

display(aux_CI)
println()
println(size(aux_CI))
println(length(aux_CI))

tot = CuArrays.zeros(length(aux_CI))

function bucle_1(y,tot)
    index    = (blockIdx().x - 1) * blockDim().x + threadIdx().x
    stride   = blockDim().x * gridDim().x
    for i    = index:stride:length(y)
        tot[i] = y[1]+y[2]
    end
end;

numblocks      = 256
@cuda threads  = 256 blocks = numblocks bucle_1(aux_CI,tot)

I typed it fast so it may be me making mistakes, but anyway…

Best regards,

Ferran.

dpsanders · June 28, 2020, 5:46pm

aux_CI is an array that lives on the CPU. You need to make a CuArray version.

Ellipse0934 · July 1, 2020, 7:05am

Change

tot[i] = y[1]+y[2]

to

tot[i] = y[i][1]+y[i][2]

You can pass CartesianIndices(x) directly to GPU kernels, it will pass OneTo(len) instead of constructing an array.

duanestorti · September 5, 2020, 1:24am

Is there a place to find examples using CartesianIndex objects on GPU? Is it as straightforward as defining the CartesianIndices object, passing it to the kernel function, and then accessing elements using a linear index constructed from the block and thread index values?

maleadt · September 7, 2020, 3:01pm

Yes, there’s several kernels like that in CUDA.jl and GPUArrays.jl, e.g., https://github.com/JuliaGPU/GPUArrays.jl/blob/b988cdcc81011ded7223f250d127a1e544ea2d2a/src/host/broadcast.jl#L53-L72
(where @cartesianidx is a simple macro that creates an iterator based on the current block/thread index).

Topic		Replies	Views
Combine CartesianIndices for effective CUDA kernels GPU gpu , indexing	4	813	May 11, 2021
CuArrays.jl errors with indexing / CartesianIndices General Usage gpu , indexing , cuda	1	603	July 24, 2019
Is it possible to index a CuArray with a CuArray? GPU question	1	857	January 11, 2019
Filter CartesianIndices General Usage indexing	1	397	April 10, 2019
Mapping ThreadIdx().x to a 5D array? GPU	8	1245	June 15, 2018

Use of CartesianIndices with CUDA?

Related topics