Hellos
I have a code of Edge Detection using ParallelStencil
. The algorithm is basic, it just compares one pixel to its left and bottom neighbors, if their values are above one threshold, I fill the image(matrix) with a different color(value) to indicate a border.
using Images
using ParallelStencil
const USE_GPU = false
@static if USE_GPU
@init_parallel_stencil(CUDA, Float64, 2);
else
@init_parallel_stencil(Threads, Float64, 2);
end
@parallel_indices (x,y) function getEdges!(originalImage, newImage, threshold)
if ( x >= 2 && x <= size(originalImage,1) &&
y >= 2 && y <= size(originalImage,2) )
currentPixel = originalImage[x,y]
leftPixel = originalImage[x-1,y]
bottomPixel = originalImage[x,y-1]
if abs(currentPixel-leftPixel) ≥ threshold || abs(currentPixel-bottomPixel) ≥ threshold
newImage[x-1,y-1] = 1.0
end
end
return
end
Now I load the image
originalImage = load("image2.png")
nx,ny = size(originalImage) # for my particular image, the red channel was enough
imageInput = @zeros(nx,ny)
imageInput .= red.(originalImage)
# using CUDA: CuArray
# imageInput .= CuArray(red.(originalImage))
imageOutput = @zeros(nx,ny)
Define some grid and block parameters
threshold = 0.02 # value depends on the image
BLOCKX = 512
BLOCKY = 512
GRIDX = nx÷BLOCKX + 1
GRIDY = ny÷BLOCKY + 1
cuthreads = (BLOCKX, BLOCKX, 1)
cublocks = (GRIDX, GRIDY, 1)
@time @parallel cublocks cuthreads getEdges!(imageInput, imageOutput, threshold)
[RGB.(Array(imageInput)); RGB.(Array(imageOutput))] # check results
With Threads, everything works fine, but I want to run with CUDA for comparison, and here is where issues happens.
If I set const USE_GPU = true
and uncomment the lines that convert my image into CuArray, I got the error:
ERROR: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/error.jl:101 [inlined]
[3] cuLaunchKernel(f::CUDA.CuFunction, gridDimX::UInt32, gridDimY::UInt32, gridDimZ::UInt32, blockDimX::UInt32, blockDimY::UInt32, blockDimZ::UInt32, sharedMemBytes::Int64, hStream::CUDA.CuStream, kernelParams::Vector{Ptr{Nothing}}, extra::Ptr{Nothing})
@ CUDA ~/.julia/packages/CUDA/5jdFl/lib/utils/call.jl:26
[4] #39
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:69 [inlined]
[5] macro expansion
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:33 [inlined]
[6] macro expansion
@ ./none:0 [inlined]
[7] pack_arguments(::CUDA.var"#39#40"{Bool, Int64, CUDA.CuStream, CUDA.CuFunction, CUDA.CuDim3, CUDA.CuDim3}, ::CUDA.KernelState, ::CUDA.CuDeviceMatrix{Float64, 1}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::Float64, ::Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, ::Int64, ::Int64, ::Int64)
@ CUDA ./none:0
[8] #launch#38
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:62 [inlined]
[9] #44
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:136 [inlined]
[10] macro expansion
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:95 [inlined]
[11] macro expansion
@ ./none:0 [inlined]
[12] convert_arguments
@ ./none:0 [inlined]
[13] #cudacall#43
@ ~/.julia/packages/CUDA/5jdFl/lib/cudadrv/execution.jl:135 [inlined]
[14] macro expansion
@ ~/.julia/packages/CUDA/5jdFl/src/compiler/execution.jl:204 [inlined]
[15] macro expansion
@ ./none:0 [inlined]
[16] call(::CUDA.HostKernel{typeof(getEdges!), Tuple{CUDA.CuDeviceMatrix{Float64, 1}, CUDA.CuDeviceMatrix{Float64, 1}, Float64, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, Int64, Int64, Int64}}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::CUDA.CuDeviceMatrix{Float64, 1}, ::Float64, ::Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, ::Int64, ::Int64, ::Int64; call_kwargs::Base.Pairs{Symbol, Tuple{Int64, Int64, Int64}, Tuple{Symbol, Symbol}, NamedTuple{(:threads, :blocks), Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}}}})
@ CUDA ./none:0
[17] (::CUDA.HostKernel{typeof(getEdges!), Tuple{CUDA.CuDeviceMatrix{Float64, 1}, CUDA.CuDeviceMatrix{Float64, 1}, Float64, Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}, Int64, Int64, Int64}})(::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, ::Vararg{Any}; threads::Tuple{Int64, Int64, Int64}, blocks::Tuple{Int64, Int64, Int64}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/5jdFl/src/compiler/execution.jl:462
[18] macro expansion
@ ~/.julia/packages/CUDA/5jdFl/src/compiler/execution.jl:104 [inlined]
I know that my GPU by itself has some issues, thus : Can anyone reproduce my error?
I know that I don’t know kernel programming in GPU, but this code looks so simple that I naively thought that no errors would appear. Can anyone tell me what am I doing wrong?
Thank you for your attention
Notes:
- Julia v1.7.2
- CUDA.jl v3.8.5
- Nvidia GTX 1080 Ti
- ParallelStencil.jl v0.6.0