I’m doing some simple “get acquainted” experimenting with convolutions using DSP, CUDAnative, CUDAdrv and CuArrays. I’m creating random 3-d arrays - rand(Float32, N, N, N). I then create “device” versions of the arrays by calling cu(a)
A = rand(Float32, N, N, N);
B = rand(Float32, N, N, N);
A_d = cu(A);
B_d = cu(B);
I’ve written a simple function to perform an convolution on a pair of arrays:
function cuFFT(A, B)
C = conv(A, B)
finalize( C )
C =
end
Finally, I use BenchmarkTools’s @benchmark macro:
@benchmark cuFFT($A_d, $B_d)
If I set N to, say, 64, Julia returns this error:
ERROR: LoadError: CUFFTError(code 2, cuFFT failed to allocate GPU or CPU memory)
However if I set N 10 120, my script runs to completion.
When I originally posted my question, I was directly calling fft(). As I continued experimenting, I found that I was getting inconsistent results from run to run. However, I found that if I loaded DSP and called conv(), I was able to see consistent behavior, and the new puzzle that a larger N didn’t crash when a smaller N did. I also realized that I’d been assuming the problem was GPU memory, though the error message says “CPU or GPU”.
My question: is there some problem in the way that I’m calling DSP.conv(), or some setup that I need to do with BenchmarkTools?