Use ParallelStencil.jl with GPU

Hello,

I’m very happy to use ParallelStencil.jl. The CPU version (i.e. USE_GPU = false) works fine. However, when I try to run the example script diffusion3D_novis_noperf.jl with the GPU (i.e. USE_GPU = true), I get the error below.

ERROR: LoadError: MethodError: no method matching return_types(::GPUArrays.var"#5#6", ::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}, ::GPUCompiler.GPUInterpreter)
Closest candidates are:
  return_types(::Any, ::Any; world, interp) at reflection.jl:1294
  return_types(::Any) at reflection.jl:1294
Stacktrace:
  [1] check_method(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\validation.jl:19
  [2] macro expansion
    @ C:\Users\***\.julia\packages\TimerOutputs\LHjFw\src\TimerOutput.jl:253 [inlined]
  [3] macro expansion
    @ C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\driver.jl:89 [inlined]
  [4] emit_julia(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\utils.jl:64
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:324
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#5#6", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:297
  [8] cufunction
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:290 [inlined]
  [9] macro expansion
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:102 [inlined]
 [10] #launch_heuristic#282
    @ C:\Users\***\.julia\packages\CUDA\5jdFl\src\gpuarrays.jl:17 [inlined]
 [11] gpu_call(::GPUArrays.var"#5#6", ::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, ::Float64; target::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:61
 [12] gpu_call
    @ C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:34 [inlined]
 [13] fill!(A::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, x::Float64)
    @ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\host\construction.jl:14
 [14] zeros(::Type, ::Int64, ::Vararg{Int64})
    @ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\array.jl:551
 [15] diffusion3D()
    @ Main c:\Users\***\***\test_GPU_parallel_stencil.jl:29       
 [16] top-level scope
    @ c:\Users\***\***\test_GPU_parallel_stencil.jl:49
in expression starting at c:\Users\***\***\test_GPU_parallel_stencil.jl:49

I get the same error also when I test ParallelStencil.jl in the package manager.

(@v1.8) pkg> test ParallelStencil

Regarding the GPU on my laptop. I checked the following information about CUDA

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
NVIDIA driver 527.37.0, for CUDA 12.0
CUDA driver 12.0

Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 12.0.0+527.37
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.1
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce GTX 1050 (sm_61, 2.927 GiB / 3.000 GiB available)

Thank you very much in advance for your help and input to making it run on the GPU.

Hi, thanks for your feedback and for reaching out! Sorry to read you are having issues running ParallelStencil on your GPU.

I tried to reproduce it on my side but all worked as expected. I am on Julia 1.9.3 though:

julia> CUDA.versioninfo()
CUDA runtime 11.4, local installation
CUDA driver 12.1
NVIDIA driver 470.103.1, originally for CUDA 11.4

CUDA libraries: 
- CUBLAS: 11.6.5
- CURAND: 10.2.5
- CUFFT: 10.5.2
- CUSOLVER: 11.2.0
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+470.103.1

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0
- CUDA_Runtime_Discovery: 0.2.2

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 devices:
  0: NVIDIA A100-SXM4-40GB (sm_80, 39.583 GiB / 39.586 GiB available)

Can you maybe try on latest Julia? I could give it a try with Julia 1.8.1.

1 Like

Thank you very much for the instant reply. I checked using Julia 1.9.3. and now it’s working as expected :slight_smile:

I don’t know however whether it’s actually related to the Julia version or whether I didn’t install the previous version correctly. It seems that it has something to do with the GPUCompiler.

Anyways, thank you very much again and best regards,

Yeah could be. Thanks for reporting anyway and I will check on Julia 1.8.x to see if there is indeed an issue with previous versions. Good to know 1.9.x solves your issue :slight_smile: