Problem with CUDAintrinsic pow: pow(y[1,1],2.0)?

Hi again,

(sorry for all the questions!)

I am a bit puzzled that this does not work. I want to use the power function on values in a Matrix. if I uncomment x=1.0, it does work.

function kernel(ydepK::CuDeviceMatrix{Float32})
	x = ydepK[1,1]
	# x = 1.0
	y = CUDAnative.pow(x,2.0)
	return nothing
end

function cutest()
	y = rand(Float32,3,4)
	cuy = CuArray(y)
	@cuda blocks=2 threads=2 kernel(cuy)
end

output in case 1:

julia> cudaVFI.cutest()
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{typeof(Main.cudaVFI.kernel)}(Main.cudaVFI.kernel), Tuple{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}}, v"6.1.0", true, nothing, nothing, nothing, nothing, nothing, Main.cudaVFI.kernel)
└ @ CUDAnative compiler.jl:494
ERROR: could not compile kernel(CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}) for GPU; kernel returning a value

- return_type = Union{}
Stacktrace:
 [1] #compiler_error#43 at /home/floswald/.julia/packages/CUDAnative/mXUk/src/compiler.jl:33 [inlined]
 [2] (::getfield(CUDAnative, Symbol("#kw##compiler_error")))(::NamedTuple{(:return_type,),Tuple{Core.TypeofBottom}}, ::typeof(CUDAnative.compiler_error), ::CUDAnative.CompilerContext, ::String) at ./<missing>:0
 [3] validate_invocation(::CUDAnative.CompilerContext) at /home/floswald/.julia/packages/CUDAnative/mXUk/src/validation.jl:15
 [4] compile_function(::CUDAnative.CompilerContext) at /home/floswald/.julia/packages/CUDAnative/mXUk/src/compiler.jl:496
 [5] #cufunction#78(::Base.Iterators.Pairs{Symbol,typeof(Main.cudaVFI.kernel),Tuple{Symbol},NamedTuple{(:inner_f,),Tuple{typeof(Main.cudaVFI.kernel)}}}, ::Function, ::CUDAdrv.CuDevice, ::Function, ::Type) at /home/floswald/.julia/packages/CUDAnative/mXUk/src/compiler.jl:572
 [6] (::getfield(CUDAnative, Symbol("#kw##cufunction")))(::NamedTuple{(:inner_f,),Tuple{typeof(Main.cudaVFI.kernel)}}, ::typeof(CUDAnative.cufunction), ::CUDAdrv.CuDevice, ::Function, ::Type) at ./<missing>:0
 [7] @generated body at /home/floswald/.julia/packages/CUDAnative/mXUk/src/execution.jl:214 [inlined]
 [8] _cuda at /home/floswald/.julia/packages/CUDAnative/mXUk/src/execution.jl:171 [inlined]
 [9] macro expansion at ./gcutils.jl:87 [inlined]
 [10] cutest() at /home/floswald/git/VFI/Julia/cudaVFI/src/aldrich.jl:214
 [11] top-level scope

julia> include("cudaVFI.jl")
WARNING: replacing module cudaVFI.
Main.cudaVFI

case 2


julia> cudaVFI.cutest()
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{typeof(Main.cudaVFI.kernel)}(Main.cudaVFI.kernel), Tuple{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}}, v"6.1.0", true, nothing, nothing, nothing, nothing, nothing, Main.cudaVFI.kernel)
└ @ CUDAnative compiler.jl:494
┌ Debug: Module entry point: 
│   LLVM.name(entry) = "ptxcall_kernel_28"
└ @ CUDAnative utils.jl:7
┌ Debug: Compiled CUDAnative.KernelWrapper{typeof(Main.cudaVFI.kernel)}(Main.cudaVFI.kernel) to PTX 6.1.0 for SM 6.1.0 using 2 registers.
│ Memory usage: 0 B local, 0 B shared, 0 B constant
└ @ CUDAnative compiler.jl:584

There is no pow(::Float32, ::Float64): https://github.com/JuliaGPU/CUDAnative.jl/blob/b56946eded59b76854436072c7e0b47f0b89ea30/src/device/libdevice.jl#L188-L192

There should probably be an alias converting the first argument to Float64. Feel free to open an issue/PR :slightly_smiling_face:

argh. how dumb! I’m still a bit confused with having to store arrays as Float32 but then use functions with Float64. thanks!