How to round off a Float32 to Int32 on GPU

question
#1
using CUDAdrv, CUDAnative,CuArrays
#try to convert a Float32 to Int32
function addKernel!(x,y,θ)
    x = threadIdx().x+blockDim().x*(blockIdx().x-1)
    m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
    z = Int32(m)
    x[z,z] = y[z,z]
    return
end
#create GPU arrays
N = 512
x = CuArray(fill(1.0f0,N,N))
y = CuArray(fill(1.0f0,N,N))

@device_code_warntype @cuda threads=(16,16) blocks = (32,32) addKernel!(x,y,Float32(1.0))

It doesn’t compile:

GPU compilation of addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) failed
KernelError: kernel returns a value of type `Union{}`

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.`

using round doesn’t work either.Any idea?

#2

That error isn’t due to the conversion. You’re passing two arrays to your kernel, x and y, and multiplying that array doing CUDAnative.sin(θ)*y. You can kind-of see that from the code_warntype output, since it is the last executed instruction before the kernel errors (the unreachable in the output):

│   %31 = Base.llvmcall::Core.IntrinsicFunction
│   %32 = (%31)(("declare float @__nv_sinf(float)", "%2 =  call float @__nv_sinf(float %0)\nret float %2"), Float32, Tuple{Float32}, θ)::Float32
│   %33 = invoke Base.broadcast(Base.:*::typeof(*), %32::Float32, _3::CuDeviceArray{Float32,2,CUDAnative.AS.Global})::Array{Float32,2}
│         (%30 + %33)
└──       $(Expr(:unreachable))
#3

Oh,I make such a mistake!(I use the same name x)But even after I change the kernel

function addKernel!(a,b,θ)
    x = threadIdx().x+blockDim().x*(blockIdx().x-1)
    m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
    z = Int32(m)
    a[z,z] = b[z,z]
    return
end

It still doesn’t work:

InvalidIRError: compiling addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_box_float32)
Stacktrace:
 [1] Type at float.jl:703
 [2] addKernel! at In[2]:6

So I think it still has something to do with type conversion.

#4

y is still undefined in that kernel? Assuming you meant the following:

julia> function addKernel!(a,b,θ)
           x = threadIdx().x+blockDim().x*(blockIdx().x-1)
           y = threadIdx().y+blockDim().y*(blockIdx().y-1)
           m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
           z = Int32(m)
           a[z,z] = b[z,z]
           return
       end
addKernel! (generic function with 1 method)

You can use z = unsafe_trunc(Int32, m) to force an unsafe conversion. I had expected the box to work though, but performance would have been bad so it’s better to do that unsafe conversion instead.

1 Like
#5

Conversion using Int32 constructor now also works: https://github.com/JuliaGPU/CUDAnative.jl/pull/388

1 Like