Casting, annotations and numeric types for CUDAnative

maleadt · January 21, 2019, 9:36am

I can’t change this Float32 → Int32

Why not? Doing so just works…

apparently @cuDynamicSharedMem can only be 1 dimensional?

No, it can be higher-dimensional but you need to pass the size as a tuple. @cu...SharedMem macros are a bit of a hack and really should be a proper array type, at which point they should probably implement all of the common constructor syntaxes.

it allows me to say linear_index :: Int.. but not linear_index :: Int32..

Again, that just works here. Do know that generally you want to avoid such patterns since it generates quite ugly code (branches for checked arithmetic, allocs and calls to exception methods) that typically is unwanted in hot code. GPU intrinsics return Int32 values but the Julia functions return Int (for codegen reasons) so you can safely unsafe_trunc(Int32, ...) if you really want a 32-bit value.

Maybe you forgot to reserve memory for the dynamic shared memory (specified using the shmem keyword argument to @cuda)? Do note that your use of dynamic shared memory is kind-of peculiar, since it’s normally shared between threads but you only use a single thread. Doing so will probably hurt your occupancy. If you want truly local memory, why not use StaticArrays?

Anyway, here’s the code that works:

using CUDAdrv, CUDAnative, CuArrays

function l_0(x, y, z, w, h)
       return x + y*w + z*w*h
end

function l(x, y, z, w, h)
       _x = x - 1
       _y = y - 1
       _z = z - 1
       return l_0(_x, _y, _z, w, h) + 1
end

function kernel(out)
    x = blockIdx().x
    y = blockIdx().y
    w = gridDim().x
    h = gridDim().y
    z = 1

    arr = @cuDynamicSharedMem(Int32, (w, h, 3))
    arr[x,y,z] = Int32(1)
    linear_index::Int32  = l(x,y,z,w,h)
    arr[linear_index] = Int32(1) # still works

    out[x, y] = linear_index
    return nothing
end

function make_matrix(width :: Int, height :: Int)
    grid = (width, height)
    threads = (1,)

    cu_out = CuArray{Int32, 2}(undef, width, height)

    @cuda blocks=grid threads=threads shmem=sizeof(Int32)*prod(grid) kernel(cu_out)
    out = Array{Int32, 2}(cu_out)
    return out
end

function main()
    width = 10
    height = 10
    matrix = make_matrix(width, height)
    println(matrix)
end

main()

Topic		Replies	Views
CUDAnative: kernel multidimensional access GPU cudanative	3	1168	February 3, 2017
What is the recommended type <: Integer to use when doing index arithmetics? GPU cudanative	3	1466	July 24, 2018
CUDAnative dynamic allocation GPU question , cudanative	5	1810	March 4, 2020
CUDAnative: simple and nice way to define type for all integer/float literals?; size() returning UInt32? GPU	5	801	July 12, 2019
Mapping ThreadIdx().x to a 5D array? GPU	8	1206	June 15, 2018

Casting, annotations and numeric types for CUDAnative

Related topics