I am trying to define a layer that is a polynomial in Flux. The definition seems to work, but any training with it diverges. It is also impossible to calculate the Jacobian of this polynomial function using Flux, which should be trivial. The following code illustrates the problem with the Jacobian. The question is: what am I doing wrong?

using Flux

using DynamicPolynomials

using MultivariatePolynomials

struct PolyModel

mexp

W

end

function PolyModel(in::Integer, out::Integer, order::Integer)

@polyvar x[1:in]

mx0 = monomials(x, 0:order)

mx1 = [exponents(mx0[k]) for k=1:length(mx0)]

return PolyModel(mx1, param(randn(out, length(mx0))))

end

Flux.@treelike PolyModel

function (a::PolyModel)(z::AbstractArray)

return a.W*[prod([z[k]^(a.mexp[j][k]) for k=1:length(z)]) for j=1:length(a.mexp)]

end

a = PolyModel(4,4,2)

Flux.jacobian(a,zeros(4))

ERROR: MethodError: no method matching Float64(::Flux.Tracker.TrackedReal{Float64})

Closest candidates are:

Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:185

Float64(::T<:Number) where T<:Number at boot.jl:725

Float64(::Int8) at float.jl:60

…

Have not tried to run this, but your problem with ForwardDiff is that `mon = ones(length(a.mexp))`

creates a vector of Float64, but then `mon[j] *= z[k]`

tries to write a dual number into that space, and can’t. You can usually solve this by making sure any new array has the eltype of the input array, like `(z::AbstractArray{T}) where T`

and `ones(T,...)`

.

I suspect the problem with Flux may be the same, `Float64(::TrackedReal{...`

is a similar conversion problem. (Also, Flux does not support mutation of an existing array.) But this code is less obvious to me, what does `@polyvar`

do?

Flux generally likes array operations over scalar operations. Could the line `a.W*[prod([z[k]^(a.mexp[j][k])...`

be written without explicit indexing? Must `mexp`

be an array of arrays?

1 Like

Thanks for the suggestion. There is some improvement. I also needed to add

`Base.convert(::Type{T}, x::Flux.Tracker.TrackedReal{T}) where T<:Real = x.data`

but this has messed things up differently. `Flux.jacobian`

still does not work.

I don’t think you want to do that, it will silently strip tracking information, and prevent you from finding bugs.

Posting here just for completeness - rewriting the problem as per my answer in https://github.com/FluxML/Flux.jl/issues/597#issuecomment-461079720 makes it work.