CUDA.math functions in Julia

Hi,

I’m new to CUDA in Julia. Is there any way to use the CUDA math functions in Julia?

Frank

I think all fundamental mathematical functions are readily defined for CuArrays (arrays stored on the GPU). The same should hold for their use in kernels you write. Just try it out and see :slight_smile:

Thanks! I do actually look for special functions like the Bessel function, which is defined in CUDA.math.

There is a SpecialFunctions package which provides bessel functions etc. CUDA.jl overrides the definitions here. So I think it should work by just importing SpecialFunctions and CUDA.

Ok, I’ve just tried

julia> using SpecialFunctions
julia> using CUDA
julia> besselj0.(cu(rand(Float32,3)))

and it failed with an InvalidIRError, unsupported call to literal pointer.

Stack trace
ERROR: InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to j0f)
Stacktrace:
 [1] besselj0
   @ ~/.julia/packages/SpecialFunctions/L13jJ/src/bessel.jl:199
 [2] _broadcast_getindex_evalf
   @ broadcast.jl:648
 [3] _broadcast_getindex
   @ broadcast.jl:621
 [4] getindex
   @ broadcast.jl:575
 [5] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:62
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/validation.jl:123
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:288 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/driver.jl:286 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module, kernel::LLVM.Function; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:306
  [7] check_cache
    @ ~/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:44 [inlined]
  [8] cached_compilation
    @ ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:60 [inlined]
  [9] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/XwWPj/src/cache.jl:0
 [10] cufunction(f::GPUArrays.var"#broadcast_kernel#12", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:294
 [11] cufunction
    @ ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:288 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/CUDA/M4jkK/src/compiler/execution.jl:102 [inlined]
 [13] #launch_heuristic#280
    @ ~/.julia/packages/CUDA/M4jkK/src/gpuarrays.jl:17 [inlined]
 [14] launch_heuristic
    @ ~/.julia/packages/CUDA/M4jkK/src/gpuarrays.jl:17 [inlined]
 [15] copyto!
    @ ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
 [16] copyto!(dest::CuArray{Float32, 1}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{CuArray{Float32, 1}}})
    @ Base.Broadcast ./broadcast.jl:935
 [17] copy(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(besselj0), Tuple{CuArray{Float32, 1}}})
    @ Base.Broadcast ./broadcast.jl:907
 [18] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(besselj0), Tuple{CuArray{Float32, 1}}})
    @ Base.Broadcast ./broadcast.jl:882
 [19] top-level scope
    @ REPL[4]:1

# on some nightly version
Julia Version 1.7.0-DEV.252
Commit 33237ef7ad (2021-01-11 09:49 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, skylake-avx512)

But maybe I’m doing sth wrong.

1 Like

CUDA.jl<3 calls them j0 and so on, instead of besselj0.