Sequence-to-one modelling and Flux.reset!

I’m trying to build a sequence-to-one model using Flux and I’m running into an error that I have not been able to solve.

Following: Building simple sequence-to-one RNN with Flux - New to Julia - JuliaLang it seems that it wasn’t an issue at that time but I’m not sure since I didn’t try it out back then.

The problem is that when I try to use Flux.reset! I get an error message.

Here’s a working example to replicate the error:

using Flux
x = [rand(Float32, 2, 32) for _ ∈ 1:10]
y = rand(Float32, 1, 32)

mutable struct MyModel
    rnn
    fc
end
Flux.@functor MyModel

function (m::MyModel)(x)
    Flux.reset!(m.rnn) # THIS IS THE PROBLEMATIC LINE, IF REMOVED THE CODE RUNS FINE
    [m.rnn(x[i]) for i ∈ 1:length(x)-1]
    m.fc(m.rnn(x[end]))
end
m = MyModel(RNN(2, 5), Dense(5, 1))

loss(x, y) = Flux.mse(m(x), y)
opt = Descent(1e-2)
ps = Flux.params(m)
# This works
loss(x, y)
# This doesn't work
Flux.train!(loss, ps, [(x, y)], opt)

Strangely enough, loss(x, y) works just fine but when used with Flux.train! it throws an error.

ERROR: LoadError: DimensionMismatch("new dimensions (5, 1) must be consistent with array size 160")

Am I doing something wrong in the way I’m building my model or is this a bug?

Thanks in advance for your help!

Without diagnosing the specific issue, I would strongly recommend calling reset! outside of your loss function. That unfortunately means ditching train!, but IME train! doesn’t really work for RNNs in the first place because of the assumptions it makes around input batching.

2 Likes

Thank you so much, that does indeed solve the issue!