I am currently implementing a toy version of the architecture as found in:
Deep learning for universal linear embeddings of nonlinear dynamics | Nature Communications , which is effectively an encoder → affine → decoder process.
I have already implemented the architecture and have been testing on single trajectory datasets. This is the struct I am employing as the model:
struct A
encoder
decoder
K
end
where the encoder and decoder are MLPs and K is simply a matrix. In order to move my testing on to multiple trajectories, I am employing this model:
struct B
encoder
decoder
K::Vector
end
In this case, K in this model holds one of the K’s in the above model per trajectory. So, these two architectures should be equivalent if training on a single trajectory dataset.
However, I am not getting this expected behavior. For a single trajectory dataset, A
fits well while B
does not converge whatsoever no matter the training time or parameters used. In fact, the loss almost stays constant throughout training but jitters somewhat.
While attempting to print the norm of the gradient for B
, I get an error which suggest to me that the gradient is empty. Here is an MWE:
using Flux
using LinearAlgebra
Xs = [rand(2,10)]
mutable struct B
mlp
K::Vector
end
function loss(Xs,model)
l = 0
for i in 1:length(Xs)
l += norm(model.K[i])
end
return l
end
model = B(Dense(2,2),[rand(2,2)])
ps = params(model.mlp,model.K)
gs = gradient(() -> loss(Xs,model),ps)
norm(gs)
which gives the following error:
**ERROR:** MethodError: no method matching iterate(::Nothing)
Closest candidates are:
iterate(::DataStructures.TrieIterator) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112
iterate(::DataStructures.TrieIterator, ::Any) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112
iterate(::Cmd) at process.jl:638
...
Stacktrace:
[1] **isempty(** ::Nothing **)** at **./essentials.jl:737**
[2] **norm(** ::Nothing, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605** (repeats 2 times)
[3] **generic_normInf(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:446**
[4] **normInf** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:536** [inlined]
[5] **generic_norm2(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:477**
[6] **norm2** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:538** [inlined]
[7] **norm(** ::Zygote.Grads, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:607**
[8] **norm(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605**
[9] top-level scope at **REPL[9]:1**
I am not certain of this, but perhaps this means that the second model’s gradient is empty? I do not get such errors for the first model.
I’d like to know if there is a fix for such unexpected behavior or perhaps there is a better way to implement the second model (I have looked at Flux’s “advanced model building” section but it was not obvious to me how it might help).
I am currently using Flux v0.12.1.