_generic_matmatmul error in sciml_train

Hello all, I am trying to train a neural ODE model using DiffEqFlux.sciml_train. I have reasons to believe that if I use the outer vector product of the current state u as the neural net input, I might get better performance and I am trying to test that. The model is roughly as follows (simplified a bit to omit what I think are irrelevant details):

cross_system = function (du, u, p, t, net)

	cross_vec = collect(Iterators.flatten(u * transpose(u)))

	du[1:end] = net(p)(cross_vec)
end

However, I get the following error during autodiff (full trace here):

ERROR: LoadError: DimensionMismatch("matrix A has dimensions (18,18), matrix B has dimensions (1,18)")

Based on the error message I know what the error MEANS, i.e. somewhere in the code it is trying to multiply 18x18 matrix by a 1x18 row vector which is undefined. I can also understand that the 1x18 row vector most likely appears due to the transpose (without which I don’t really know how to get an outer product efficiently). But I don’t know WHERE the error is located. I don’t think I am doing any matrix-row vector multiplication in my code, so I think it’s down to autodiff magic. At all unsure how to resolve this, any help is appreciated.

Iterators.flatten seems like a complicated way to make a vector here, and perhaps difficult for AD to think about. Some other ways are:

julia> u = [1,2,3];

julia> v = collect(Iterators.flatten(u * transpose(u)));

julia> v == vec(u * transpose(u))
true

julia> v == vec(kron(u,u))
true

Trying these out, it appears that ReverseDiff may not like u * transpose(u), that’s a bug surely, but the kron option works:

julia> ReverseDiff.gradient([1,2,3]) do u
         v = vec(u * transpose(u))
         v[1]
       end
ERROR: DimensionMismatch: matrix A has dimensions (3,3), matrix B has dimensions (1,3)

julia> ReverseDiff.gradient([1,2,3]) do u
         v = vec(kron(u,u))
         v[1]
       end
3-element Vector{Int64}:
 2
 0
 0

julia> Zygote.gradient([1,2,3]) do u
         v = vec(u * transpose(u))
         v[1]
       end
([2.0, 0.0, 0.0],)

julia> Zygote.gradient([1,2,3]) do u
         v = vec(kron(u,u))
         v[1]
       end
([2.0, 0.0, 0.0],)

# Errors with  collect(Iterators.flatten(...))

julia> ReverseDiff.gradient([1,2,3]) do u
         v = collect(Iterators.flatten(u * transpose(u)))
         v[1]
       end
ERROR: DimensionMismatch: matrix A has dimensions (3,3), matrix B has dimensions (1,3)
Stacktrace:
  [1] _generic_matmatmul!(C::Vector{Int64}, tA::Char, tB::Char, A::Matrix{Int64}, B::Vector{Int64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
...

julia> Zygote.gradient([1,2,3]) do u
         v = collect(Iterators.flatten(u * transpose(u)))
         v[1]
       end
ERROR: Mutating arrays is not supported -- called copyto!(::Vector{Int64}, _...)
Stacktrace:
...
  [5] (::typeof(∂(_collect)))(Δ::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/compiler/interface2.jl:0
...
1 Like

Many thanks, kron(u, u) (which I did not know about previously) indeed works, no more errors! Out of curiosity, why do you do vec(kron(u, u,)) instead of just kron(u,u)? When I tested it they both seemed to result in the same thing.

Oh right, no need for vec(kron(...)), just wasn’t thinking.

The transpose thing is worth an issue on ReverseDiff. Even things like ReverseDiff.gradient(u -> sum(u * u'), [1,2,3]) don’t seem to work, so some rule is not handling row vectors correctly.

2 Likes