Hello,
I’m very happy to use ParallelStencil.jl. The CPU version (i.e. USE_GPU = false) works fine. However, when I try to run the example script diffusion3D_novis_noperf.jl with the GPU (i.e. USE_GPU = true), I get the error below.
ERROR: LoadError: MethodError: no method matching return_types(::GPUArrays.var"#5#6", ::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}, ::GPUCompiler.GPUInterpreter)
Closest candidates are:
return_types(::Any, ::Any; world, interp) at reflection.jl:1294
return_types(::Any) at reflection.jl:1294
Stacktrace:
[1] check_method(job::GPUCompiler.CompilerJob)
@ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\validation.jl:19
[2] macro expansion
@ C:\Users\***\.julia\packages\TimerOutputs\LHjFw\src\TimerOutput.jl:253 [inlined]
[3] macro expansion
@ C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\driver.jl:89 [inlined]
[4] emit_julia(job::GPUCompiler.CompilerJob)
@ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\utils.jl:64
[5] cufunction_compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:324
[6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler C:\Users\***\.julia\packages\GPUCompiler\I9fZc\src\cache.jl:90
[7] cufunction(f::GPUArrays.var"#5#6", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceArray{Float64, 3, 1}, Float64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:297
[8] cufunction
@ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:290 [inlined]
[9] macro expansion
@ C:\Users\***\.julia\packages\CUDA\5jdFl\src\compiler\execution.jl:102 [inlined]
[10] #launch_heuristic#282
@ C:\Users\***\.julia\packages\CUDA\5jdFl\src\gpuarrays.jl:17 [inlined]
[11] gpu_call(::GPUArrays.var"#5#6", ::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, ::Float64; target::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
@ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:61
[12] gpu_call
@ C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\device\execution.jl:34 [inlined]
[13] fill!(A::CUDA.CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}, x::Float64)
@ GPUArrays C:\Users\***\.julia\packages\GPUArrays\TnEpb\src\host\construction.jl:14
[14] zeros(::Type, ::Int64, ::Vararg{Int64})
@ CUDA C:\Users\***\.julia\packages\CUDA\5jdFl\src\array.jl:551
[15] diffusion3D()
@ Main c:\Users\***\***\test_GPU_parallel_stencil.jl:29
[16] top-level scope
@ c:\Users\***\***\test_GPU_parallel_stencil.jl:49
in expression starting at c:\Users\***\***\test_GPU_parallel_stencil.jl:49
I get the same error also when I test ParallelStencil.jl in the package manager.
(@v1.8) pkg> test ParallelStencil
Regarding the GPU on my laptop. I checked the following information about CUDA
julia> using CUDA
julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
NVIDIA driver 527.37.0, for CUDA 12.0
CUDA driver 12.0
Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 12.0.0+527.37
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.1
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
1 device:
0: NVIDIA GeForce GTX 1050 (sm_61, 2.927 GiB / 3.000 GiB available)
Thank you very much in advance for your help and input to making it run on the GPU.