CUDA with IJulia results in unexpected errors

Hey,

I’m observing a strange bug. The code base is quite large and therefore I couldn’t really break it down.

But what I observe:

Calling my complex function (4D CUDA Arrays, FFT) the first time, results in this error:
Internal error: encountered unexpected error during compilation of #cached_compilation#107:
ErrorException("unsupported or misplaced expression "return" in function #cached_compilation#107")
jl_errorf at /buildworker/worker/package_linux64/build/src/rtutils.c:77
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:4570
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:4009
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:4251 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6814
jl_emit_code at /buildworker/worker/package_linux64/build/src/codegen.cpp:7160
jl_emit_codeinst at /buildworker/worker/package_linux64/build/src/codegen.cpp:7205
_jl_compile_codeinst at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:124
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:352
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1957
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:2223 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2216 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
cached_compilation at /home/fxw/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65
unknown function (ip: 0x7f7da81797d1)
#cufunction#796 at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:289
cufunction at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:286
macro expansion at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:100 [inlined]
#launch_heuristic#857 at /home/fxw/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
launch_heuristic at /home/fxw/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
copyto! at /home/fxw/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
copyto! at ./broadcast.jl:936 [inlined]
copy at ./broadcast.jl:908 [inlined]
materialize at ./broadcast.jl:883 [inlined]
broadcast at ./broadcast.jl:821 [inlined]
copy1 at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:27
complexfloat at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:20 [inlined]
fft at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238 [inlined]
fft at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:561
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:669
top-level scope at In[9]:1
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:879
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:827
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:931
eval at ./boot.jl:360 [inlined]
include_string at ./loading.jl:1090
softscope_include_string at /home/fxw/.julia/packages/SoftGlobalScope/u4UzH/src/SoftGlobalScope.jl:65
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
execute_request at /home/fxw/.julia/packages/IJulia/e8kqU/src/execute_request.jl:67
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:672
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:722
#invokelatest#2 at ./essentials.jl:707 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
eventloop at /home/fxw/.julia/packages/IJulia/e8kqU/src/eventloop.jl:8
#15 at ./task.jl:406
unknown function (ip: 0x7f7f74e3e36c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
Internal error: encountered unexpected error during compilation of #cached_compilation#107:
ErrorException("unsupported or misplaced expression "return" in function #cached_compilation#107")
jl_errorf at /buildworker/worker/package_linux64/build/src/rtutils.c:77
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:4570
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:4009
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:4251 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6814
jl_emit_code at /buildworker/worker/package_linux64/build/src/codegen.cpp:7160
jl_emit_codeinst at /buildworker/worker/package_linux64/build/src/codegen.cpp:7205
_jl_compile_codeinst at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:124
jl_generate_fptr_for_unspecialized at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:396
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1963
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:2223 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2216 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
cached_compilation at /home/fxw/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65
unknown function (ip: 0x7f7da81797d1)
#cufunction#796 at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:289
cufunction at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:286
macro expansion at /home/fxw/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:100 [inlined]
#launch_heuristic#857 at /home/fxw/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
launch_heuristic at /home/fxw/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
copyto! at /home/fxw/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
copyto! at ./broadcast.jl:936 [inlined]
copy at ./broadcast.jl:908 [inlined]
materialize at ./broadcast.jl:883 [inlined]
broadcast at ./broadcast.jl:821 [inlined]
copy1 at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:27
complexfloat at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:20 [inlined]
fft at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238 [inlined]
fft at /home/fxw/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:561
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:669
top-level scope at In[9]:1
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:879
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:827
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:931
eval at ./boot.jl:360 [inlined]
include_string at ./loading.jl:1090
softscope_include_string at /home/fxw/.julia/packages/SoftGlobalScope/u4UzH/src/SoftGlobalScope.jl:65
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
execute_request at /home/fxw/.julia/packages/IJulia/e8kqU/src/execute_request.jl:67
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:672
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:722
#invokelatest#2 at ./essentials.jl:707 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
eventloop at /home/fxw/.julia/packages/IJulia/e8kqU/src/eventloop.jl:8
#15 at ./task.jl:406
unknown function (ip: 0x7f7f74e3e36c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2406
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))

MethodError: no method matching Base.CodegenParams(; track_allocations=false, code_coverage=false, static_alloc=false, prefer_specsig=true, emit_function=GPUCompiler.var"#hook_emit_function#45"{GPUCompiler.MethodCompileTracer}(GPUCompiler.MethodCompileTracer(PTX CompilerJob of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) for sm_75, Core.MethodInstance[MethodInstance for (::GPUArrays.var"#broadcast_kernel#12")(::CUDA.CuKernelContext, ::CuDeviceVector{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64)], #undef)), emitted_function=GPUCompiler.var"#hook_emitted_function#46"{GPUCompiler.MethodCompileTracer}(GPUCompiler.MethodCompileTracer(PTX CompilerJob of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) for sm_75, Core.MethodInstance[MethodInstance for (::GPUArrays.var"#broadcast_kernel#12")(::CUDA.CuKernelContext, ::CuDeviceVector{ComplexF64, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64)], #undef)), gnu_pubnames=false, debug_info_kind=0)
Closest candidates are:
  Base.CodegenParams(; track_allocations, code_coverage, prefer_specsig, gnu_pubnames, debug_info_kind, lookup, generic_context) at reflection.jl:1023 got unsupported keyword arguments "static_alloc", "emit_function", "emitted_function"

Stacktrace:
  [1] kwerr(kw::NamedTuple{(:track_allocations, :code_coverage, :static_alloc, :prefer_specsig, :emit_function, :emitted_function, :gnu_pubnames, :debug_info_kind), Tuple{Bool, Bool, Bool, Bool, GPUCompiler.var"#hook_emit_function#45"{GPUCompiler.MethodCompileTracer}, GPUCompiler.var"#hook_emitted_function#46"{GPUCompiler.MethodCompileTracer}, Bool, Int32}}, args::Type)
    @ Base ./error.jl:157
  [2] compile_method_instance(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance, world::UInt64)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/irgen.jl:119
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [4] irgen(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance, world::UInt64)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/irgen.jl:334
  [5] macro expansion
    @ ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:94 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:93
  [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
  [9] compile
    @ ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [10] cufunction_compile(source::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:302
 [11] cufunction_compile(source::GPUCompiler.FunctionSpec)
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:297
 [12] check_cache(cache::Dict{UInt64, Any}, compiler::Any, linker::Any, spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}, prekey::UInt64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [13] cached_compilation(cache::Dict{UInt64, Any}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link), spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GPUCompiler ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:60
 [14] cached_compilation(cache::Dict{UInt64, Any}, compiler::Function, linker::Function, spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65
 [15] cufunction(f::GPUArrays.var"#broadcast_kernel#12", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:289
 [16] cufunction(f::GPUArrays.var"#broadcast_kernel#12", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{ComplexF64, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF64}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:286
 [17] macro expansion
    @ ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:100 [inlined]
 [18] #launch_heuristic#857
    @ ~/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
 [19] launch_heuristic
    @ ~/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
 [20] copyto!
    @ ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
 [21] copyto!
    @ ./broadcast.jl:936 [inlined]
 [22] copy
    @ ./broadcast.jl:908 [inlined]
 [23] materialize
    @ ./broadcast.jl:883 [inlined]
 [24] broadcast
    @ ./broadcast.jl:821 [inlined]
 [25] copy1(#unused#::Type{ComplexF64}, x::CuArray{Int64, 1})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:27
 [26] complexfloat
    @ ~/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:20 [inlined]
 [27] fft
    @ ~/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238 [inlined]
 [28] top-level scope
    @ In[9]:1
 [29] eval
    @ ./boot.jl:360 [inlined]
 [30] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1090
Calling it the second time:
MethodError: no method matching Base.CodegenParams(; track_allocations=false, code_coverage=false, static_alloc=false, prefer_specsig=true, emit_function=GPUCompiler.var"#hook_emit_function#45"{GPUCompiler.MethodCompileTracer}(GPUCompiler.MethodCompileTracer(PTX CompilerJob of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64) for sm_75, Core.MethodInstance[MethodInstance for (::GPUArrays.var"#broadcast_kernel#12")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF32, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, ::Int64)], #undef)), emitted_function=GPUCompiler.var"#hook_emitted_function#46"{GPUCompiler.MethodCompileTracer}(GPUCompiler.MethodCompileTracer(PTX CompilerJob of kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64) for sm_75, Core.MethodInstance[MethodInstance for (::GPUArrays.var"#broadcast_kernel#12")(::CUDA.CuKernelContext, ::CuDeviceMatrix{ComplexF32, 1}, ::Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, ::Int64)], #undef)), gnu_pubnames=false, debug_info_kind=0)
Closest candidates are:
  Base.CodegenParams(; track_allocations, code_coverage, prefer_specsig, gnu_pubnames, debug_info_kind, lookup, generic_context) at reflection.jl:1023 got unsupported keyword arguments "static_alloc", "emit_function", "emitted_function"

Stacktrace:
  [1] kwerr(kw::NamedTuple{(:track_allocations, :code_coverage, :static_alloc, :prefer_specsig, :emit_function, :emitted_function, :gnu_pubnames, :debug_info_kind), Tuple{Bool, Bool, Bool, Bool, GPUCompiler.var"#hook_emit_function#45"{GPUCompiler.MethodCompileTracer}, GPUCompiler.var"#hook_emitted_function#46"{GPUCompiler.MethodCompileTracer}, Bool, Int32}}, args::Type)
    @ Base ./error.jl:157
  [2] compile_method_instance(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance, world::UInt64)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/irgen.jl:119
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [4] irgen(job::GPUCompiler.CompilerJob, method_instance::Core.MethodInstance, world::UInt64)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/irgen.jl:334
  [5] macro expansion
    @ ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:94 [inlined]
  [6] macro expansion
    @ ~/.julia/packages/TimerOutputs/4QAIk/src/TimerOutput.jl:206 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:93
  [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
  [9] compile
    @ ~/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [10] cufunction_compile(source::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:302
 [11] cufunction_compile(source::GPUCompiler.FunctionSpec)
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:297
 [12] check_cache(cache::Dict{UInt64, Any}, compiler::Any, linker::Any, spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}, prekey::UInt64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [13] cached_compilation(cache::Dict{UInt64, Any}, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link), spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GPUCompiler ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:60
 [14] cached_compilation(cache::Dict{UInt64, Any}, compiler::Function, linker::Function, spec::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12", Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65
 [15] cufunction(f::GPUArrays.var"#broadcast_kernel#12", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:289
 [16] cufunction(f::GPUArrays.var"#broadcast_kernel#12", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceMatrix{ComplexF32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, CUDA.CUFFT.var"#65#66"{ComplexF32}, Tuple{Base.Broadcast.Extruded{CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}})
    @ CUDA ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:286
 [17] macro expansion
    @ ~/.julia/packages/CUDA/wTQsK/src/compiler/execution.jl:100 [inlined]
 [18] #launch_heuristic#857
    @ ~/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
 [19] launch_heuristic
    @ ~/.julia/packages/CUDA/wTQsK/src/gpuarrays.jl:17 [inlined]
 [20] copyto!
    @ ~/.julia/packages/GPUArrays/WV76E/src/host/broadcast.jl:66 [inlined]
 [21] copyto!
    @ ./broadcast.jl:936 [inlined]
 [22] copy
    @ ./broadcast.jl:908 [inlined]
 [23] materialize
    @ ./broadcast.jl:883 [inlined]
 [24] broadcast(f::CUDA.CUFFT.var"#65#66"{ComplexF32}, As::CuArray{Float32, 2})
    @ Base.Broadcast ./broadcast.jl:821
 [25] copy1(#unused#::Type{ComplexF32}, x::CuArray{Float32, 2})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:27
 [26] complexfloat
    @ ~/.julia/packages/CUDA/wTQsK/lib/cufft/util.jl:20 [inlined]
 [27] fft(x::CuArray{Float32, 2}, region::UnitRange{Int64})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/wTQsK/lib/cufft/fft.jl:238
 [28] top-level scope
    @ In[10]:1
 [29] eval
    @ ./boot.jl:360 [inlined]
 [30] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1090

And from now on, my complete Jupyter is broken with CUDA. Something like CUDA.rand(2,2) .+ 1 in a fresh (!) IJulia notebook terminates with same/similar errors as above. If I delete the ~/.julia/compiled/v1.6 IJulia, CUDA folders, it seems to work again. Until I call the problematic parts.

If I call exactly the same functions from the REPL, it works fine.

So my questions:

  • Is there anything known about a broken IJulia together with CUDA and possibly CUFFT?
  • What can I do to get better messaging? It takes quite long (1.6 is faster, but still :smiley_cat:) to delete everything, precompiling, rerunning…
  • Can be dividing by 0 cause such a error? The errors seems input dependent. I possibly divide by 0. Trying this separately, wasn’t problematic.

I’m sorry that I can’t provide more information for now :confused:

Thanks,

Felix

This is a simple version incompatibility between Julia and CUDA (manifesting as a compiler error because GPUCompiler.jl, a dependency of CUDA.jl, does some fancy things that only works on certain Julia versions). What versions of Julia and CUDA.jl are you using?

1 Like

Currently Julia 1.6-rc1 and CUDA 2.6.1.

According to GitHub, this version is tested against the 1.6.
So how do I actually know which versions are compatible?

Out of curiosity: what is the reason that it does work in the REPL?

You are using an old version of GPUCompiler. Maybe you baked it into a sysimg at some point?

Hm doesn’t require CUDA v2.6.1. The GPU compiler v0.10.0 which is the recent one? In my Julia v1.6 it’s using that one.

Or are you referring to a system wide GPU cuda library?

You’re not. The static_alloc kwarg in the error was removed in GPUCompiler 0.9: Adapt to upcoming changes. · JuliaGPU/GPUCompiler.jl@6f26b5d · GitHub

1 Like

OK I believe the error was caused by an old Manifest file at the location of the jupyter notebook :expressionless:

Thanks!