Flux RNN on a GPU - unnecessary copying

tanhevg · June 12, 2019, 12:40pm

Hello,

A question for someone who has experience writing/debugging GPU code.

Consider something like this:

using CuArrays, Flux
lstm = LSTM(5, 3) |> gpu
data = [rand(5) for i=1:10]
data = gpu.(data)
out = lstm.(data) # SIC!

I need to broadcast the lstm call in order to make use of stateful properties of RNN. data is a vanilla Vector (not a CuArray); each element of data is a CuArray.

How will this broadcast be handled by Flux and CUDAnative? Will the RNN be indeed executed on the GPU? Will this cause any unnecessary copying of data from the GPU to CPU and back in between of invocations of individual RNN cells?

Thanks in advance.

Topic		Replies	Views
Manual RNN on GPU Machine Learning question	3	531	April 20, 2022
LSTM on a GPU Machine Learning	0	541	June 10, 2019
Flux.jl: Using CuDNN for RNNs Machine Learning question	2	480	November 11, 2022
Uploading vector of vectors to GPU in flux.jl Machine Learning question	4	634	October 1, 2021
Training Flux LSTM on GPU is slower than on CPU Machine Learning question , flux , lstm	1	262	May 16, 2024

Flux RNN on a GPU - unnecessary copying

Related topics