SpecialFUnctions and CUDA

Hi,
I thought SpecialFunctions.jl worked flowlesly with CUDA, but I get this

using CUDA

N = 7000
M = CUDA.rand(N,N)
besselh.(0,M)

ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{…}, Tuple{…}, typeof(besselh), Tuple{…}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to ijl_lazy_load_and_lookup)
Stacktrace:
 [1] _besselh
   @ ~/.julia/packages/SpecialFunctions/npKKV/src/bessel.jl:214
 [2] multiple call sites
   @ unknown:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/validation.jl:147
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:460 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:459 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/utils.jl:103
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/utils.jl:97 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:136
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:115 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:111
 [10] compile
    @ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:103 [inlined]
 [11] #1145
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/compilation.jl:254 [inlined]
 [12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:52
 [13] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:42
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/compilation.jl:253
 [15] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:128
 [16] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:103
 [17] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:369 [inlined]
 [18] macro expansion
    @ ./lock.jl:267 [inlined]
 [19] cufunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{…}, Base.Broadcast.Broadcasted{…}, Int64}}; kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:364
 [20] cufunction
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:361 [inlined]
 [21] macro expansion
    @ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:112 [inlined]
 [22] #launch_heuristic#1204
    @ ~/.julia/packages/CUDA/75aiI/src/gpuarrays.jl:17 [inlined]
 [23] launch_heuristic
    @ ~/.julia/packages/CUDA/75aiI/src/gpuarrays.jl:15 [inlined]
 [24] _copyto!
    @ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:78 [inlined]
 [25] copyto!
    @ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:44 [inlined]
 [26] copy
    @ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:29 [inlined]
 [27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.DeviceMemory}, Nothing, typeof(besselh), Tuple{Int64, CuArray{Float32, 2, CUDA.DeviceMemory}}})
    @ Base.Broadcast ./broadcast.jl:903
 [28] top-level scope
    @ REPL[40]:1
Some type information was truncated. Use `show(err)` to see complete types.

This is ubuntu 20.04 on an Intel I9 and a nVidia 3060 (12Gb RAM) working well with other CUDA things (including programming in julia). I run Julia 1.10.3

Best,

Ferran.

SpecialFunctions.jl uses a library called openspecfun.

Nvidia does seem to provide at least some of them CUDA Math API Reference Manual

and CUDA.jl provides device replacements for SpecialFunctions.jl

But besselh is not one of them, I think besslh can be written in terms of besselj and bessely so you could open an issue or PR to CUDA.jl

that said, for Bessel functions, you likely want to use Bessels.jl which in the CPU at least is often ~10 faster. not sure if the code will be GPU compatible by default.