I am currently implementing a toy version of the architecture as found in:
Deep learning for universal linear embeddings of nonlinear dynamics | Nature Communications , which is effectively an encoder → affine → decoder process.
I have already implemented the architecture and have been testing on single trajectory datasets. This is the struct I am employing as the model:
struct A encoder decoder K end
where the encoder and decoder are MLPs and K is simply a matrix. In order to move my testing on to multiple trajectories, I am employing this model:
struct B encoder decoder K::Vector end
In this case, K in this model holds one of the K’s in the above model per trajectory. So, these two architectures should be equivalent if training on a single trajectory dataset.
However, I am not getting this expected behavior. For a single trajectory dataset,
A fits well while
B does not converge whatsoever no matter the training time or parameters used. In fact, the loss almost stays constant throughout training but jitters somewhat.
While attempting to print the norm of the gradient for
B, I get an error which suggest to me that the gradient is empty. Here is an MWE:
using Flux using LinearAlgebra Xs = [rand(2,10)] mutable struct B mlp K::Vector end function loss(Xs,model) l = 0 for i in 1:length(Xs) l += norm(model.K[i]) end return l end model = B(Dense(2,2),[rand(2,2)]) ps = params(model.mlp,model.K) gs = gradient(() -> loss(Xs,model),ps) norm(gs)
which gives the following error:
**ERROR:** MethodError: no method matching iterate(::Nothing) Closest candidates are: iterate(::DataStructures.TrieIterator) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112 iterate(::DataStructures.TrieIterator, ::Any) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112 iterate(::Cmd) at process.jl:638 ... Stacktrace:  **isempty(** ::Nothing **)** at **./essentials.jl:737**  **norm(** ::Nothing, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605** (repeats 2 times)  **generic_normInf(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:446**  **normInf** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:536** [inlined]  **generic_norm2(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:477**  **norm2** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:538** [inlined]  **norm(** ::Zygote.Grads, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:607**  **norm(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605**  top-level scope at **REPL:1**
I am not certain of this, but perhaps this means that the second model’s gradient is empty? I do not get such errors for the first model.
I’d like to know if there is a fix for such unexpected behavior or perhaps there is a better way to implement the second model (I have looked at Flux’s “advanced model building” section but it was not obvious to me how it might help).
I am currently using Flux v0.12.1.