I am trying to define a layer that is a polynomial in Flux. The definition seems to work, but any training with it diverges. It is also impossible to calculate the Jacobian of this polynomial function using Flux, which should be trivial. The following code illustrates the problem with the Jacobian. The question is: what am I doing wrong?
using Flux
using DynamicPolynomials
using MultivariatePolynomials
struct PolyModel
mexp
W
end
function PolyModel(in::Integer, out::Integer, order::Integer)
@polyvar x[1:in]
mx0 = monomials(x, 0:order)
mx1 = [exponents(mx0[k]) for k=1:length(mx0)]
return PolyModel(mx1, param(randn(out, length(mx0))))
end
Flux.@treelike PolyModel
function (a::PolyModel)(z::AbstractArray)
return a.W*[prod([z[k]^(a.mexp[j][k]) for k=1:length(z)]) for j=1:length(a.mexp)]
end
a = PolyModel(4,4,2)
Flux.jacobian(a,zeros(4))
ERROR: MethodError: no method matching Float64(::Flux.Tracker.TrackedReal{Float64})
Closest candidates are:
Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:185
Float64(::T<:Number) where T<:Number at boot.jl:725
Float64(::Int8) at float.jl:60
…
Have not tried to run this, but your problem with ForwardDiff is that mon = ones(length(a.mexp))
creates a vector of Float64, but then mon[j] *= z[k]
tries to write a dual number into that space, and can’t. You can usually solve this by making sure any new array has the eltype of the input array, like (z::AbstractArray{T}) where T
and ones(T,...)
.
I suspect the problem with Flux may be the same, Float64(::TrackedReal{...
is a similar conversion problem. (Also, Flux does not support mutation of an existing array.) But this code is less obvious to me, what does @polyvar
do?
Flux generally likes array operations over scalar operations. Could the line a.W*[prod([z[k]^(a.mexp[j][k])...
be written without explicit indexing? Must mexp
be an array of arrays?
1 Like
Thanks for the suggestion. There is some improvement. I also needed to add
Base.convert(::Type{T}, x::Flux.Tracker.TrackedReal{T}) where T<:Real = x.data
but this has messed things up differently. Flux.jacobian
still does not work.
I don’t think you want to do that, it will silently strip tracking information, and prevent you from finding bugs.
Posting here just for completeness - rewriting the problem as per my answer in https://github.com/FluxML/Flux.jl/issues/597#issuecomment-461079720 makes it work.