Flux: user defined layer problem

I am trying to define a layer that is a polynomial in Flux. The definition seems to work, but any training with it diverges. It is also impossible to calculate the Jacobian of this polynomial function using Flux, which should be trivial. The following code illustrates the problem with the Jacobian. The question is: what am I doing wrong?

using Flux
using DynamicPolynomials
using MultivariatePolynomials

struct PolyModel

function PolyModel(in::Integer, out::Integer, order::Integer)
@polyvar x[1:in]
mx0 = monomials(x, 0:order)
mx1 = [exponents(mx0[k]) for k=1:length(mx0)]
return PolyModel(mx1, param(randn(out, length(mx0))))

Flux.@treelike PolyModel

function (a::PolyModel)(z::AbstractArray)
return a.W*[prod([z[k]^(a.mexp[j][k]) for k=1:length(z)]) for j=1:length(a.mexp)]
a = PolyModel(4,4,2)
ERROR: MethodError: no method matching Float64(::Flux.Tracker.TrackedReal{Float64})
Closest candidates are:
Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:185
Float64(::T<:Number) where T<:Number at boot.jl:725
Float64(::Int8) at float.jl:60

I have opened an ticket about this here: https://github.com/FluxML/Flux.jl/issues/597

Have not tried to run this, but your problem with ForwardDiff is that mon = ones(length(a.mexp)) creates a vector of Float64, but then mon[j] *= z[k] tries to write a dual number into that space, and can’t. You can usually solve this by making sure any new array has the eltype of the input array, like (z::AbstractArray{T}) where T and ones(T,...).

I suspect the problem with Flux may be the same, Float64(::TrackedReal{... is a similar conversion problem. (Also, Flux does not support mutation of an existing array.) But this code is less obvious to me, what does @polyvar do?

Flux generally likes array operations over scalar operations. Could the line a.W*[prod([z[k]^(a.mexp[j][k])... be written without explicit indexing? Must mexp be an array of arrays?

1 Like

Thanks for the suggestion. There is some improvement. I also needed to add

Base.convert(::Type{T}, x::Flux.Tracker.TrackedReal{T}) where T<:Real = x.data

but this has messed things up differently. Flux.jacobian still does not work.

I don’t think you want to do that, it will silently strip tracking information, and prevent you from finding bugs.

Posting here just for completeness - rewriting the problem as per my answer in https://github.com/FluxML/Flux.jl/issues/597#issuecomment-461079720 makes it work.