Found Bug in Flux

Jack_N · June 29, 2022, 3:46am

Thank you for your response. My MWE does not include the full model chain, which may provide some context for why I require the DataLoader (or some other way of slicing within the model chain). I have included the full chain at the bottom of this post.

My initial post on the topic is here How to take full advantage of GPU Parallelism on Nested Sequential Data in Flux - #4 by jonathan-laurent. In the MWE I gave above in this question, (m, n, p) translates to (num_features, max_inner_seq_length, max_outter_seq_length * batch_size) in @jonathan-laurent’s excellent answer.

@jonathan-laurent gave me very helpful advice on how to process this data while taking advantage of the fast, parallel nature of the GPU (which worked with my implementation (with DataLoader) up until I tried to compute the gradients).

As you can see by reading his answer, not only do I need to for loop through slices at the beginning of the chain (which I agree could be done outside of the chain), but in the middle of the model chain I need to reshape the data from 2D to 3D and again for-loop through slices.

Since this second DataLoader application and data reshaping must be applied in between the application of my two recurrent models, there is no way to avoid having a method to slice 3D data in the model chain. You have made it clear to me that DataLoader likely is not the right choice.

I am very much open to suggestions and I would love to know what you would recommend in order to create slices of a 3D tensor within the model chain and reshape a 2D tensor to a 3D tensor within the model chain, while ensuring the gradient can be calculated.

Moreover, I am very much open to any better ways of implementing @jonathan-laurent’s excellent advice for computing my model using the GPU (and thus tensors).

Please feel free to jump in as well @jonathan-laurent if you know how I can implement your excellent advice too!

Thank you,

Jack

For reference, my full model chain at the moment is below, which again, fully works for computing outputs given inputs or for computing the loss given inputs and targets, but does not work when it comes to computing the gradient.

Please let me know if you have any questions.

c = Chain(
          DataLoader,
          rnns,
          d->d[:, :, 1],
          d->reshape(d, (output_feature_len, max_outer_sequence_len, :)),
          d->permutedims(d, (1, 3, 2)),
          DataLoader,
          rnns2,
          d->d[:, :, 1],
          softmax
) |> gpu

Topic		Replies	Views
Broadcasting over a Flux.DataLoader Machine Learning	11	1538	November 4, 2021
Building simple sequence-to-one RNN with Flux New to Julia flux	8	2080	March 4, 2021
Flux - LSTM - Issue with input format for multiple features Machine Learning flux , machine-learning	9	1852	November 1, 2022
Manual RNN on GPU Machine Learning question	3	534	April 20, 2022
Porting a RNN model to Flux from PyTorch Machine Learning	5	2140	October 29, 2018

Found Bug in Flux

Related topics