Hi,
I thought SpecialFunctions.jl worked flowlesly with CUDA, but I get this
using CUDA
N = 7000
M = CUDA.rand(N,N)
besselh.(0,M)
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#35#37")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{…}, Tuple{…}, typeof(besselh), Tuple{…}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to ijl_lazy_load_and_lookup)
Stacktrace:
[1] _besselh
@ ~/.julia/packages/SpecialFunctions/npKKV/src/bessel.jl:214
[2] multiple call sites
@ unknown:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/validation.jl:147
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:460 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:459 [inlined]
[5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/utils.jl:103
[6] emit_llvm
@ ~/.julia/packages/GPUCompiler/nWT2N/src/utils.jl:97 [inlined]
[7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:136
[8] codegen
@ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:115 [inlined]
[9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:111
[10] compile
@ ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:103 [inlined]
[11] #1145
@ ~/.julia/packages/CUDA/75aiI/src/compiler/compilation.jl:254 [inlined]
[12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:52
[13] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/driver.jl:42
[14] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/compilation.jl:253
[15] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:128
[16] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/nWT2N/src/execution.jl:103
[17] macro expansion
@ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:369 [inlined]
[18] macro expansion
@ ./lock.jl:267 [inlined]
[19] cufunction(f::GPUArrays.var"#35#37", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{…}, Base.Broadcast.Broadcasted{…}, Int64}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:364
[20] cufunction
@ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:361 [inlined]
[21] macro expansion
@ ~/.julia/packages/CUDA/75aiI/src/compiler/execution.jl:112 [inlined]
[22] #launch_heuristic#1204
@ ~/.julia/packages/CUDA/75aiI/src/gpuarrays.jl:17 [inlined]
[23] launch_heuristic
@ ~/.julia/packages/CUDA/75aiI/src/gpuarrays.jl:15 [inlined]
[24] _copyto!
@ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:78 [inlined]
[25] copyto!
@ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:44 [inlined]
[26] copy
@ ~/.julia/packages/GPUArrays/WuHrl/src/host/broadcast.jl:29 [inlined]
[27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2, CUDA.DeviceMemory}, Nothing, typeof(besselh), Tuple{Int64, CuArray{Float32, 2, CUDA.DeviceMemory}}})
@ Base.Broadcast ./broadcast.jl:903
[28] top-level scope
@ REPL[40]:1
Some type information was truncated. Use `show(err)` to see complete types.
This is ubuntu 20.04 on an Intel I9 and a nVidia 3060 (12Gb RAM) working well with other CUDA things (including programming in julia). I run Julia 1.10.3
Best,
Ferran.