Hello,
A question for someone who has experience writing/debugging GPU code.
Consider something like this:
using CuArrays, Flux
lstm = LSTM(5, 3) |> gpu
data = [rand(5) for i=1:10]
data = gpu.(data)
out = lstm.(data) # SIC!
I need to broadcast the lstm
call in order to make use of stateful properties of RNN. data
is a vanilla Vector (not a CuArray
); each element of data
is a CuArray
.
How will this broadcast be handled by Flux
and CUDAnative
? Will the RNN be indeed executed on the GPU? Will this cause any unnecessary copying of data from the GPU to CPU and back in between of invocations of individual RNN cells?
Thanks in advance.