Using BenchmarkTools with CUDAnative and CuArrays and running out of CPU or GPU memory

Josiah_Slack · August 7, 2019, 5:41pm

I’m doing some simple “get acquainted” experimenting with convolutions using DSP, CUDAnative, CUDAdrv and CuArrays. I’m creating random 3-d arrays - rand(Float32, N, N, N). I then create “device” versions of the arrays by calling cu(a)

A = rand(Float32, N, N, N);
B = rand(Float32, N, N, N);
A_d = cu(A);
B_d = cu(B);

I’ve written a simple function to perform an convolution on a pair of arrays:

function cuFFT(A, B)
C = conv(A, B)
finalize( C )
C =
end

Finally, I use BenchmarkTools’s @benchmark macro:

@benchmark cuFFT($A_d, $B_d)

If I set N to, say, 64, Julia returns this error:

ERROR: LoadError: CUFFTError(code 2, cuFFT failed to allocate GPU or CPU memory)

However if I set N 10 120, my script runs to completion.

When I originally posted my question, I was directly calling fft(). As I continued experimenting, I found that I was getting inconsistent results from run to run. However, I found that if I loaded DSP and called conv(), I was able to see consistent behavior, and the new puzzle that a larger N didn’t crash when a smaller N did. I also realized that I’d been assuming the problem was GPU memory, though the error message says “CPU or GPU”.

My question: is there some problem in the way that I’m calling DSP.conv(), or some setup that I need to do with BenchmarkTools?

Topic		Replies	Views
Using CUDA fft General Usage	1	1148	March 13, 2019
CuArrays and garbage collection when doing FFT convolutions Performance gpu , gpuarrays	12	1693	March 6, 2019
CUDA: MVectors always allocate memory and cause "Out of Memory Error" GPU question	2	915	June 14, 2019
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5176	January 4, 2021
CuArrays: error calling CuArray() (ERROR_INVALID_DEVICE) GPU question	25	3732	February 16, 2020

Using BenchmarkTools with CUDAnative and CuArrays and running out of CPU or GPU memory

Related topics