Broadcasting in loss function for RNN with timeseries

I’m trying to get an RNN to work with timeseries and I don’t understand why things don’t work as expected. I found a workaround, but it just seems to be too complicated.

My data are sequences of 29 observations, and I have 210 observations. With the chunk function I got the (29x210) matrix reshaped into a 210-element Array train_x.

X = reshape(rand(29*210), (29,210)
train_x = chunk(X, 210)

To get my y in the same shape, I apply chunk there as well:
y = chunk(rand(210), 210)

I created an RNN model rnn() = Chain(RNN(29,10), Dense(10,1)) that returns a single value. I can now run this as yhat = m.(train_x)

This outputs a 210-element Array. So far, so good, I think. Now I’m trying to create a MAE loss function. So, yhat .- y works as expected (where I can even ommit the dot). Now I’m trying to apply the abs function. I can understand why abs(yhat .- y) does not work, but I would have expected abs.(yhat.-y) to work. Which it does not…

Well, I can flatten the difference with ..., but that leaves me with abs.([((yhat .- y)...)...]) with is both ugly and unreadable. I’m guessing I’m doing something wrong here, but I dont understand why abs.(yhat.-y) does not work as expected.

Also, I will want to feed this to train! eventually, and I’m not a 100% sure that all these dots will work as expected with the gradients.

questions:

  • can someone explain why abs.(yhat .- y) does not work as expected?
  • What is the advised way to approach this? Is this approach with all the ... indeed what is necessary, or is there a more elegant way?

Could it be that you accidentally overwrote abs in you session? This should just work:

julia> a = rand(3);
julia> b = rand(3);
julia> abs.(a - b)
3-element Array{Float64,1}:
 0.478177259423227
 0.047047520176273006
 0.8114939438678288
1 Like

You’re in luck, Flux already has a built-in MAE loss: https://fluxml.ai/Flux.jl/stable/models/losses/#Losses-Reference-1

To add on to @visr’s comment, you’ll also have to clarify what exactly “does not work as expected” means. Is an error being thrown? Are the outputs not what you expected? The gradients? A minimal working example would be much appreciated :slight_smile:

1 Like

I’m trying to remember - I think a few versions back some loss functions had unexpected type instability. So along those lines upgrade Flux?

Also are you emitting a scalar value for Flux to back prop? I see you mentioning MAE, but I see no sum() or mean() in the offending code?

Error message would be very appreciated :smiley:

abs.(a-b) works fine, so that is not the problem, so it seems.
The built in MAE loss is exactly what I coded by hand and would expect, eg agg(abs.(ŷ .- y)).

I think the problem lies in the shape:
z = [[1], [2], [3]]
abs.(z)

doesnt work either. The error message is ERROR: MethodError: no method matching abs(::ARRAY{Int64,1})

So, I guess will need to remove the extra brackets, somehow flattening the array. Still I’m a bit confused why this is happening in the first place.

A minimal working example is this:

using Flux: chunk,
m = Chain(RNN(29,10), Dense(10,1))
X = reshape(rand(29*210), (29,210))
train_x = chunk(X, 210)
ŷ = m.(train_x)
y = chunk(rand(210), 210)
abs.(ŷ .- y)