Parallel feedforward computation with RNN

HenriDeh · July 15, 2019, 8:51am

Say you have a sample of 3 observations-label as a training set for a standard fully connected NN. To parallelize the computation of the loss (and the gradient) of each label, you can feed the three observations to the NN as a matrix instead of sequentially feeding the three observation vectors. Like so:

net = Chain(Dense(10,5,relu),Dense(5,1,relu)) |> gpu
x = cu(rand(10,3)) #each column of that matrix is one observation
y = cu(rand(1,3))  #each element is one label
output = net(x)
loss = Flux.mse(output,y)

and that’s it, the gpu matrix multiplication took care of the parallelization. However as I understand it, a RNN changes the inner state of the neural network depending on the previous inputs of the network. That means that if I input 3 different observations (of potentially different sizes), to parallelize the computation of the gradients, the state of three different networks will have to be kept in memory. Is that not a significant drawback of RNN ? Does Flux implements parallel feedforwards for RNN or do I have to create a gpu kernel or something ?

Topic		Replies	Views
Flux RNN on a GPU - unnecessary copying Machine Learning	0	453	June 12, 2019
Manual RNN on GPU Machine Learning question	3	531	April 20, 2022
Found Bug in Flux General Usage question , package , bug , flux	13	1181	July 11, 2022
Uploading vector of vectors to GPU in flux.jl Machine Learning question	4	634	October 1, 2021
How can I feed sequences of different lengths though a RNN and still have it be differentiable in Flux? (Minimal example included) Machine Learning flux	0	333	April 1, 2021

Parallel feedforward computation with RNN

Related topics