My real goal is to include layers with matrices computed from known physics solutions; the training will then determine physical parameters appearing in the equations used to compute the matrix elements.
To get started, I created a toy problem with a 2x2 matrix. The equation generating matrix terms is:
function w(x::AbstractArray,y::Number)
[ 3x[1]*y 2x[1]*x[1] ; -1.25y*y 2.5*x[1]*x[1]*y]
end
In the toy example, I only want to estimate the value of x, so it is set up as a 1 element vector, while y is left as a Number.
I compute a “true solution” (training data) using x=[1], y=2, and initialize a model with a value x_guess=[1.2] as a single layer network, noting that the input to the network is a constant vector [1.;1.].
The model runs, the loss function works correctly, but the solution always converges to the same wrong answer, about 1.38.
The complete working example code is below, including some plotting lines that show that the error function has a minimum at x=1 as expected.
Does anyone have ideas why the network “training” does not get the right answer (i.e., spot the bug(s), sadly)? I’ve been staring at this but must be missing something; it should be able to get to zero error at x=1, but converges to the value x=1.38 with an error of 18.904071388502857.
using Flux
# trivial layer
struct mylayer{X <: AbstractArray, Y <: Number}
x::X
y::Y
end
# function that generates component of weight matrix
function w(x::AbstractArray,y::Number)
[ 3x[1]*y 2x[1]*x[1] ; -1.25y*y 2.5*x[1]*x[1]*y]
end
function (a::mylayer)(vec::AbstractArray)
x,y = a.x, a.y
wmat = w(x,y)
wmat*vec
end
# jazz it up with functor
Flux.@functor mylayer
# we want the true answer to be for x=y=1 - "input" is const [1.;1.]
y_true = w([1.0],2.0)*[1.0;1.0]
# now set up network configuration stuff
function loss(xx,yy)
ŷ = m([1.0,1.0])
Flux.mse(ŷ,yy)
end
opt = Descent(0.0005) # need to damp this or it blows up
# initialize model with [x]=1.2, y=2.0
x_guess = [1.02]
m = Chain(mylayer(x_guess,2.0))
# now train:
Flux.@epochs 15 Flux.train!(loss, params(m), zip(x_guess,y_true), opt, cb = () -> println("loss=$(loss([1.],y_true)) x=$(params(m))"))
# plot error function to show it is right, with minimum at [x]=[1.0]
using Plots
xvals = [x for x in 0:0.1:2.0]
err = [Flux.mse(w([x],2)*[1.;1.], y_true) for x in xvals]
Plots.plot(xvals,err)