Flux.jl Inconsistent Training on Custom Architecture

than123 · June 21, 2021, 7:55pm

I am currently implementing a toy version of the architecture as found in:
Deep learning for universal linear embeddings of nonlinear dynamics | Nature Communications , which is effectively an encoder → affine → decoder process.

I have already implemented the architecture and have been testing on single trajectory datasets. This is the struct I am employing as the model:

struct A
    encoder
    decoder
    K
end

where the encoder and decoder are MLPs and K is simply a matrix. In order to move my testing on to multiple trajectories, I am employing this model:

struct B
   encoder
   decoder
   K::Vector
end

In this case, K in this model holds one of the K’s in the above model per trajectory. So, these two architectures should be equivalent if training on a single trajectory dataset.

However, I am not getting this expected behavior. For a single trajectory dataset, A fits well while B does not converge whatsoever no matter the training time or parameters used. In fact, the loss almost stays constant throughout training but jitters somewhat.

While attempting to print the norm of the gradient for B, I get an error which suggest to me that the gradient is empty. Here is an MWE:

using Flux
using LinearAlgebra

Xs = [rand(2,10)]

mutable struct B
    mlp
    K::Vector
end

function loss(Xs,model)
    l = 0
    for i in 1:length(Xs)
        l += norm(model.K[i])
    end
    return l
end

model = B(Dense(2,2),[rand(2,2)])
ps = params(model.mlp,model.K)
gs = gradient(() -> loss(Xs,model),ps)
norm(gs)

which gives the following error:

**ERROR:** MethodError: no method matching iterate(::Nothing)

Closest candidates are:

iterate(::DataStructures.TrieIterator) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112

iterate(::DataStructures.TrieIterator, ::Any) at /Users/tylerhan/.julia/packages/DataStructures/ixwFs/src/trie.jl:112

iterate(::Cmd) at process.jl:638

...

Stacktrace:

[1] **isempty(** ::Nothing **)** at **./essentials.jl:737**

[2] **norm(** ::Nothing, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605** (repeats 2 times)

[3] **generic_normInf(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:446**

[4] **normInf** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:536** [inlined]

[5] **generic_norm2(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:477**

[6] **norm2** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:538** [inlined]

[7] **norm(** ::Zygote.Grads, ::Int64 **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:607**

[8] **norm(** ::Zygote.Grads **)** at **/Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:605**

[9] top-level scope at **REPL[9]:1**

I am not certain of this, but perhaps this means that the second model’s gradient is empty? I do not get such errors for the first model.

I’d like to know if there is a fix for such unexpected behavior or perhaps there is a better way to implement the second model (I have looked at Flux’s “advanced model building” section but it was not obvious to me how it might help).

I am currently using Flux v0.12.1.

ToucheSir · June 21, 2021, 10:14pm

Please see Please read: make it easier to help you. We can try to work out the issue by speculating about it, but the best way to get a resolution is to post a MWE that demonstrates it. For Flux stuff, that includes dummy data, library versions used and a full stacktrace of any relevant errors.

than123 · June 22, 2021, 1:47am

I see! Will do. Thanks for letting me know.

than123 · June 22, 2021, 2:16am

The description has been updated! Please let me know if there are other things I could add which might help.

ToucheSir · June 22, 2021, 2:38am

Can you also provide an end-to-end, minimal running example that throws that error (i.e. a MWE)? As-is there’s nothing inherent in the definition of either struct that would cause an issue, so having that full context is important.

than123 · June 22, 2021, 3:51am

Edited. Thank you for bearing with me!

than123 · June 23, 2021, 6:43pm

I have found a workaround by simply augmenting K, but for the future’s sake, I would still like to know what exactly here is not permissible.

ToucheSir · June 23, 2021, 6:54pm

gradient with implicit params (i.e. what params returns) will return a Grads struct. This is not an array, but more a bag of arrays. To calculate the norm of that, you’d have to iterate through it and calculate the norm of each element. Note that this also requires checking for nothing, as that’s the value used for parameters that aren’t involved in the gradient calculation.

As for your workaround, I’m not sure what “augmenting K” means. Could you post an updated example with this augmentation that works as you’d expect?

than123 · June 23, 2021, 7:31pm

I see, but I don’t get an error involving nothing for the first model. And in the second model, all the parameters should be used for the gradient.

By “augmenting K”, I just mean simply stacking the corresponding K matrices, as in:

newK = vcat(oldK...)

and I identify the relevant K matrices in the loss function through slicing. I suppose this workaround confirms that there is something I don’t understand about wrapping them in a Vector.

ToucheSir · June 23, 2021, 8:52pm

I was referring to the MWE, in which only model.K is used. Incidentally, the gradient there is nothing as well in the MWE because of an interaction between implicit parameters (i.e. the thing you get from params and pass to gradient) and the AD (Zygote). c.f:

julia> using Zygote, LinearAlgebra

julia> X = [rand(2, 2)]
1-element Vector{Matrix{Float64}}:
 [0.6202008269430048 0.8555679159356662; 0.8423362289177463 0.09771425479926421]

# 1. this doesn't work
julia> gradient(() -> norm(X[1]), Params(X)).grads
IdDict{Any, Any} with 2 entries:
  [0.620201 0.855568; 0.842336 0.0977143] => nothing  # gradient wrt. X[1]
  :(Main.X)                               => Union{Nothing, Matrix{Float64}}[[0.45775 0.631467; 0.621701 0.0721198]]

# 2. but this does
julia> gradient(() -> norm(X[1]), Params([X])).grads
IdDict{Any, Any} with 2 entries:
  :(Main.X)                                 => Union{Nothing, Matrix{Float64}}[[0.45775 0.631467; 0.621701 0.0721198]]
  [[0.620201 0.855568; 0.842336 0.0977143]] => Union{Nothing, Matrix{Float64}}[[0.45775 0.631467; 0.621701 0.0721198]]

# 3. as does this
julia> x₁ = X[1]
2×2 Matrix{Float64}:
 0.620201  0.855568
 0.842336  0.0977143

julia> gradient(() -> norm(x₁), Params(X)).grads
IdDict{Any, Any} with 2 entries:
  [0.620201 0.855568; 0.842336 0.0977143] => [0.45775 0.631467; 0.621701 0.0721198]
  :(Main.x₁)                              => [0.45775 0.631467; 0.621701 0.0721198]

# 4. and this (note explicit instead of implicit parameters.
# That is, we pass X directly and use it instead of params(X)). 
# This works with full Flux models too!
julia> gradient(x -> norm(x[1]), X)[1]
1-element Vector{Union{Nothing, Matrix{Float64}}}:
 [0.4577503206426886 0.6314672132598408; 0.6217013298363201 0.07211975463850043]

The gist is that params (and Params, if given a single argument) splat their arguments into an underlying IdDict:

julia> params(X).order[1]
2×2 Matrix{Float64}:  # this is X[1], where you'd expect it to be X itself
 0.620201  0.855568
 0.842336  0.0977143

For whatever reason, Zygote isn’t smart enough to link the X[1] in the loss to the actual value of X[1] in the params. You can see I avoid this in 2) and 3) by stopping X from being unravelled and hoisting the declaration of X[1] into a variable respectively.

Topic		Replies	Views
Incorrect (implicit) gradients for a custom model in Flux Machine Learning question , flux	3	434	March 12, 2021
Flux.jl manual training loop results in `error gradent(F, ::Params) are deprecated` New to Julia	2	104	June 16, 2025
Gradients of custom functions with Flux Machine Learning flux	2	1185	April 5, 2020
Lux (And Flux), "parallel" Network Input. When Input is flat, Zygote gradient works, when input is not flat it doesn't Machine Learning flux , zygote , lux	10	681	February 5, 2024
Another error with defining Flux.jl gradients Machine Learning first-steps , flux	5	1667	December 5, 2018

Flux.jl Inconsistent Training on Custom Architecture

Related topics