How to round off a Float32 to Int32 on GPU

11116 · April 14, 2019, 3:05am

using CUDAdrv, CUDAnative,CuArrays
#try to convert a Float32 to Int32
function addKernel!(x,y,θ)
    x = threadIdx().x+blockDim().x*(blockIdx().x-1)
    m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
    z = Int32(m)
    x[z,z] = y[z,z]
    return
end
#create GPU arrays
N = 512
x = CuArray(fill(1.0f0,N,N))
y = CuArray(fill(1.0f0,N,N))

@device_code_warntype @cuda threads=(16,16) blocks = (32,32) addKernel!(x,y,Float32(1.0))

It doesn’t compile:

GPU compilation of addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) failed
KernelError: kernel returns a value of type `Union{}`

Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.`

using round doesn’t work either.Any idea?

maleadt · April 14, 2019, 10:18am

That error isn’t due to the conversion. You’re passing two arrays to your kernel, x and y, and multiplying that array doing CUDAnative.sin(θ)*y. You can kind-of see that from the code_warntype output, since it is the last executed instruction before the kernel errors (the unreachable in the output):

│   %31 = Base.llvmcall::Core.IntrinsicFunction
│   %32 = (%31)(("declare float @__nv_sinf(float)", "%2 =  call float @__nv_sinf(float %0)\nret float %2"), Float32, Tuple{Float32}, θ)::Float32
│   %33 = invoke Base.broadcast(Base.:*::typeof(*), %32::Float32, _3::CuDeviceArray{Float32,2,CUDAnative.AS.Global})::Array{Float32,2}
│         (%30 + %33)
└──       $(Expr(:unreachable))

11116 · April 14, 2019, 2:25pm

Oh,I make such a mistake!(I use the same name x)But even after I change the kernel

function addKernel!(a,b,θ)
    x = threadIdx().x+blockDim().x*(blockIdx().x-1)
    m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
    z = Int32(m)
    a[z,z] = b[z,z]
    return
end

It still doesn’t work:

InvalidIRError: compiling addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_box_float32)
Stacktrace:
 [1] Type at float.jl:703
 [2] addKernel! at In[2]:6

So I think it still has something to do with type conversion.

maleadt · April 14, 2019, 4:18pm

y is still undefined in that kernel? Assuming you meant the following:

julia> function addKernel!(a,b,θ)
           x = threadIdx().x+blockDim().x*(blockIdx().x-1)
           y = threadIdx().y+blockDim().y*(blockIdx().y-1)
           m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
           z = Int32(m)
           a[z,z] = b[z,z]
           return
       end
addKernel! (generic function with 1 method)

You can use z = unsafe_trunc(Int32, m) to force an unsafe conversion. I had expected the box to work though, but performance would have been bad so it’s better to do that unsafe conversion instead.

maleadt · April 16, 2019, 11:11am

Conversion using Int32 constructor now also works: https://github.com/JuliaGPU/CUDAnative.jl/pull/388

Topic		Replies	Views
Converting result of round or floor as Int in Metal GPU	7	219	May 9, 2023
Problem with CUDAintrinsic pow: pow(y[1,1],2.0)? GPU	2	599	June 13, 2018
Float32 function return and Cuda General Usage question , data , cuda , type	3	567	June 20, 2019
Why is my kernel as slow in FP32 as in FP64 on A2000 Ada-based GPU? New to Julia gpu , cuda , float , kernel , cudajl	10	181	March 11, 2025
Problem with CUDAv3 GPU	9	882	November 8, 2021

How to round off a Float32 to Int32 on GPU

Related topics