How to take full advantage of GPU Parallelism on Nested Sequential Data in Flux

jonathan-laurent · June 21, 2022, 7:20pm

Your architecture should not be hard to parallelize. To do so, you can use a 4D data input tensor with shape (num_features, max_inner_seq_length, max_outter_seq_length, batch_size). In order to make all inner (and outter) sequences the same size, you can introduce a special padding symbol.

For your first pass, you can reshape this input tensor to (num_features, max_inner_seq_length, max_outter_seq_length * batch_size) and use any sequence processing model out-of-the-box (e.g. an RNN or Transformer). Doing so, you will get an output tensor of shape (num_out_features, max_outter_seq_length * batch_size).

For your second pass, you can reshape the output of the first pass to (num_out_features, max_outter_seq_length, batch_size) and once again use any sequence processing model of your choice to get an output with shape (num_labels, batch_size).

Many sequence processing models will also enable you to specify masks to ensure padding characters are ignored (not doing so should not be deal breaking though).

The suggestion above works best if inner and outter sequences do not vary too much in size. Alternative approaches are possible when this is not the case at all. For example, you could concatenate all tokens into a (num_features, num_tokens) tensor and have a separate tensor of shape (3, num_tokens) associating each token to an (inner_id, outter_id, batch_id) indices triple. Manipulating such data will require scatter operations that are implemented in GeometricFlux for example.

Finally, if you are looking into data with deeper nesting, it might be useful to start looking at graph neural networks (such as those implemented in GeometricFlux).

Topic		Replies	Views
GPU performance and switching from tabular to recurrent data format for Flux.jl Machine Learning gpu , flux , rnn	5	605	August 1, 2022
Uploading vector of vectors to GPU in flux.jl Machine Learning question	4	636	October 1, 2021
Found Bug in Flux General Usage question , package , bug , flux	13	1181	July 11, 2022
How to do batching in Flux's recurrent sequence model to take advantage of GPU during training? Machine Learning flux	1	819	September 12, 2019
Flux: GPU not working as expected Machine Learning flux	6	2191	July 28, 2020

How to take full advantage of GPU Parallelism on Nested Sequential Data in Flux

Related topics