using CUDAdrv, CUDAnative,CuArrays
#try to convert a Float32 to Int32
function addKernel!(x,y,θ)
x = threadIdx().x+blockDim().x*(blockIdx().x-1)
m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
z = Int32(m)
x[z,z] = y[z,z]
return
end
#create GPU arrays
N = 512
x = CuArray(fill(1.0f0,N,N))
y = CuArray(fill(1.0f0,N,N))
@device_code_warntype @cuda threads=(16,16) blocks = (32,32) addKernel!(x,y,Float32(1.0))
It doesn’t compile:
GPU compilation of addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) failed
KernelError: kernel returns a value of type `Union{}`
Make sure your kernel function ends in `return`, `return nothing` or `nothing`.
If the returned value is of type `Union{}`, your Julia code probably throws an exception.
Inspect the code with `@device_code_warntype` for more details.`
That error isn’t due to the conversion. You’re passing two arrays to your kernel, x and y, and multiplying that array doing CUDAnative.sin(θ)*y. You can kind-of see that from the code_warntype output, since it is the last executed instruction before the kernel errors (the unreachable in the output):
Oh,I make such a mistake!(I use the same name x)But even after I change the kernel
function addKernel!(a,b,θ)
x = threadIdx().x+blockDim().x*(blockIdx().x-1)
m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
z = Int32(m)
a[z,z] = b[z,z]
return
end
It still doesn’t work:
InvalidIRError: compiling addKernel!(CuDeviceArray{Float32,2,CUDAnative.AS.Global}, CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Float32) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_box_float32)
Stacktrace:
[1] Type at float.jl:703
[2] addKernel! at In[2]:6
So I think it still has something to do with type conversion.
y is still undefined in that kernel? Assuming you meant the following:
julia> function addKernel!(a,b,θ)
x = threadIdx().x+blockDim().x*(blockIdx().x-1)
y = threadIdx().y+blockDim().y*(blockIdx().y-1)
m = x*CUDAnative.cos(θ)+CUDAnative.sin(θ)*y
z = Int32(m)
a[z,z] = b[z,z]
return
end
addKernel! (generic function with 1 method)
You can use z = unsafe_trunc(Int32, m) to force an unsafe conversion. I had expected the box to work though, but performance would have been bad so it’s better to do that unsafe conversion instead.