Multilayer perceptron with time dependent multidimensional data using Flux

Hi, I have some problems coding a ( multilayer perceptron using Flux. My data is given in the form

D_t = (x_i, yi)_{i=1}^n where x_i is a scalar and y_i a $3-$dim vector and t=1,\ldots,T the timesteps. One shold notice that (x_1, y_1) in D_1 and (x_1, y_1) in D_2 correspond to each other i.e. is is the data point for the next time step. For one time-step T=1, I constructed the data for my neural network by the following:

using Flux
n = 10
nsamples = n; in_features = 1; out_features = 3;
X = randn(Float32, in_features, nsamples);
Y = randn(Float32, out_features, nsamples);
data = Flux.Data.DataLoader((X,Y), batchsize=4); # automatic batching convenience
X1, Y1 = first(data); 
@show size(X1) size(Y1)

Now I want to extend the network for T>1. My first idea was to reshape the data but I could not really find a way to extract the results later on. My other idea was to in crease the input and output features. Therefore I used

in_features = T

But for the output features I could not really find a way, since in my case it symbolizes the dimension. So actually I would need something like
out_features = (T, 3)

But I am pretty sure that this is not working. Can someone tell me a better strategy to deal with these kind of problems?

1 Like

It’s hard from the mathematical description to discern what the actual desired data shapes are. Pseudocode and/or descriptions of each dimension would be far more helpful.

One thing that might help you: Flux’s Dense layer can take multiple dimensions between the input and batch. This is useful if you can apply the MLP over each timestep simultaneously.

1 Like

I see, let me start to give an example of one specific data point in the dataset.

Example: Assuming two different input features a scalar height and a scalar weight, my output is a 3-dim vector dependin on time. One could write this data point (x_1, y_1(t)) as:

x_1 = (height_1, weight_1) and y_1(t) = (t, t^2, t^3) with t = 1, \ldots, T

Another way to represent the target would be

y_1 = [(1, 1, 1), (2, 4, 8),\ldots,(T, T^2, T^3)].

Of course this is only one data point in a dataset of n samples. I wrote it in a more general form in Julia

nsamples = 10
input_features = 2 # number of input features e.g. (height, weight)
output_features = 3 # dim of the vector e.g. (t, t^2, t^3)
T = 20 # timesteps

# One specific data point 
x1 = randn(Float32, input_features)
y1 = randn(Float32, T, output_features)

@show x1 y1

# Data set i.e. input X and target Y contains nsamples of these data points
X = randn(Float32, input_features, nsamples)
Y = randn(Float32, T, output_features, nsamples)

@show X Y

I’m not exactly sure I understand then. The input has no time dimension, but then the model adds one based on the height and weight? That’s definitely beyond what a plain MLP is capable of, so I assume you have a more complex model than originally described.

Maybe I should show what kind of data structure I have by a picture. Please notice that all the values are fictional.

One can see that I have a repeated weight and timestep values. Another way to compress the data would be the following data structure (only N/3 of the number of data is needed):

Now I am searching for a way to train an ANN such that my prediction yields:

pred(timestep, weight) \rightarrow \text{3 dim vector}


pred(timesteps, weight) \rightarrow \text{time series of 3 dim vectors}

The #1 question is, where does the batch dimension fit in? Unless you’re trying to fit the network to a single time series, I imagine you’ll have multiple sequences in this format.

As far as I know the batch size ist just the number of training examples in one training cycle, right? So I do not really care about the size yet, just take all samples at once and train the model if needed.

I mean both pictures are kinda the same. The second one ist just a compressed way of the first one. At the end both pictures shows the structure of my actual data. The question is, how can I train an ANN with these kind of data structures.

Maybe for the first picture I could write the dimension of my data like:

X = zeros(2, nsamples)
Y = zeros(3, nsamples)

and for the second one with number of time steps equal to T like:

X = Array{Vector{T}}(2, nsamples/3)
Y = Array{Vector{3}}(T, nsamples/3)
1 Like

Sorry, perhaps I wasn’t clear enough. What I see from the pictures is that the dimension labelled as “n-samples” is actually the time dimension, as the timestep varies uniformly over it. Which gives the impression that the whole table is representing a single sequence (i.e. a batch of one) instead of multiple sequences. Now it may be that you’re looking to slice and dice this single sequence into subsequences to batch and train on simulatenously, but without further context I can only guess at whether that’s the case.