Hello,
I’m very happy to use ParallelStencil.jl. The CPU version (i.e. USE_GPU = false) works fine. However, when I try to run the example script diffusion3D_novis_noperf.jl with the GPU (i.e. USE_GPU = true), I get the error below.
ERROR: LoadError: MethodError: no method matching return_types(::GPUArrays.var"#5#6", ::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}, ::GPUCompiler.GPUInterpreter)
Closest candidates are:
  return_types(::Any, ::Any; world, interp) at reflection.jl:1294
  return_types(::Any) at reflection.jl:1294
Stacktrace:
  [1] check_method(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\validation.jl:19
  [2] macro expansion
    @ C:\Users\***\.julia\packages\TimerOutputs\LHjFw\src\TimerOutput.jl:253 [inlined]
  [3] macro expansion
    @ C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\driver.jl:89 [inlined]
  [4] emit_julia(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\utils.jl:64
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:324
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#5#6", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:297
  [8] cufunction
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:290 [inlined]
  [9] macro expansion
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:102 [inlined]
 [10] #launch_heuristic#282
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\gpuarrays.jl:17 [inlined]
 [11] gpu_call(::GPUArrays.var"#5#6", ::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, ::Float64; target::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:61
 [12] gpu_call
    @ C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:34 [inlined]
 [13] fill!(A::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, x::Float64)
    @ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\host\construction.jl:14
 [14] zeros(::Type, ::Int64, ::Vararg{Int64})
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\array.jl:551
 [15] diffusion3D()
    @ Main c:\Users\***\***\test_GPU_parallel_stencil.jl:29       
 [16] top-level scope
    @ c:\Users\***\***\test_GPU_parallel_stencil.jl:49
in expression starting at c:\Users\***\***\test_GPU_parallel_stencil.jl:49
I get the same error also when I test ParallelStencil.jl in the package manager.
(@v1.8) pkg> test ParallelStencil
Regarding the GPU on my laptop. I checked the following information about CUDA
julia> using CUDA
julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
NVIDIA driver 527.37.0, for CUDA 12.0
CUDA driver 12.0
Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 12.0.0+527.37
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.1
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
  0: NVIDIA GeForce GTX 1050 (sm_61, 2.927 GiB / 3.000 GiB available)
Thank you very much in advance for your help and input to making it run on the GPU.
