Found Bug in Flux

Thank you for your response. My MWE does not include the full model chain, which may provide some context for why I require the DataLoader (or some other way of slicing within the model chain). I have included the full chain at the bottom of this post.

My initial post on the topic is here How to take full advantage of GPU Parallelism on Nested Sequential Data in Flux - #4 by jonathan-laurent. In the MWE I gave above in this question, (m, n, p) translates to (num_features, max_inner_seq_length, max_outter_seq_length * batch_size) in @jonathan-laurent’s excellent answer.

@jonathan-laurent gave me very helpful advice on how to process this data while taking advantage of the fast, parallel nature of the GPU (which worked with my implementation (with DataLoader) up until I tried to compute the gradients).

As you can see by reading his answer, not only do I need to for loop through slices at the beginning of the chain (which I agree could be done outside of the chain), but in the middle of the model chain I need to reshape the data from 2D to 3D and again for-loop through slices.

Since this second DataLoader application and data reshaping must be applied in between the application of my two recurrent models, there is no way to avoid having a method to slice 3D data in the model chain. You have made it clear to me that DataLoader likely is not the right choice.

I am very much open to suggestions and I would love to know what you would recommend in order to create slices of a 3D tensor within the model chain and reshape a 2D tensor to a 3D tensor within the model chain, while ensuring the gradient can be calculated.

Moreover, I am very much open to any better ways of implementing @jonathan-laurent’s excellent advice for computing my model using the GPU (and thus tensors).

Please feel free to jump in as well @jonathan-laurent if you know how I can implement your excellent advice too!

Thank you,

Jack


For reference, my full model chain at the moment is below, which again, fully works for computing outputs given inputs or for computing the loss given inputs and targets, but does not work when it comes to computing the gradient.

Please let me know if you have any questions.

c = Chain(
          DataLoader,
          rnns,
          d->d[:, :, 1],
          d->reshape(d, (output_feature_len, max_outer_sequence_len, :)),
          d->permutedims(d, (1, 3, 2)),
          DataLoader,
          rnns2,
          d->d[:, :, 1],
          softmax
) |> gpu