Generic Function to train NN w/ Flux

darsnack · April 13, 2020, 11:26pm

I suspect a lot of the issues here stem from the problem setup and not just Flux. You may want to start by examining the condition number of your data matrix. I found that k = 14992.546605226838 (the lower the condition number, the better conditioned your matrix). Such a large k indicates that for solving Ax = b with your data matrix, the solution will be very sensitive to errors. This makes iterative numerical methods prone to accumulating mistakes. A sign that this is happening is evident in the results you posted where simple optimizers like gradient descent or momentum result in NaNs.

Why does X \ y work? If Julia does what other languages do, then it is analytically computing the solution for \ using the pseudo inverse. This result is the exact solution to the minimum loss problem for regression + MSE. And since it is computed using an analytical approach, it won’t accumulate error in the same way as iterative methods (though it is also not without numerical error).

To make sure I’m not just making stuff up, I used the Landweber iteration to find the solution to the problem (which is basically gradient descent for Ax = b & MSE). Here is the code:

using MLJ: @load_boston, partition
using Flux: mse, train!
using Flux, Statistics

# X, y =  @load_boston; X = hcat(X...)
X = rand(354, 12)
w = rand(12)
y = sign.(X * w)
train, test = partition(eachindex(y), 0.7, rng=333)

Xtrain = X[train, :]
Xtest = X[test, :]
Xbtrain = [Xtrain ones(size(Xtrain, 1))]
Xbtest = [Xtest ones(size(Xtest, 1))]
ytrain = y[train]
ytest = y[test]

rmse(ŷ, y) = sqrt(mse(ŷ, y))

function analytical(X, y, Xtest, ytest)
    ŵ = X \ y
    ŷ = X * ŵ
    ŷtest = Xtest * ŵ
    
    println(ŵ)

    return rmse(ŷ, y), rmse(ŷtest, ytest)
end

function flux(X, y, Xtest, ytest; opt, nepochs = 1000)
    m = Dense(size(X, 2), 1)
    traindata = Flux.Data.DataLoader(permutedims(X), y; batchsize = 32)
    for epoch in 1:nepochs
        train!((x, y) -> mse(m(x), y), params(m), traindata, opt)
    end

    return rmse(m(permutedims(X)), y), rmse(m(permutedims(Xtest)), ytest)
end

function landweber(X, y, Xtest, ytest; λ = 0.1, nepochs = 1000)
    traindata = Flux.Data.DataLoader(permutedims(X), y; batchsize = 32)
    ŵ = rand(size(X, 2))
    for epoch in 1:nepochs
        for (X, y) in traindata
            ŵ -= λ * X * (transpose(X) * ŵ - y)
        end
    end
    
    ŷ = X * ŵ
    ŷtest = Xtest * ŵ
    
    return rmse(ŷ, y), rmse(ŷtest, ytest)
end

evalvector = [
    ("Analytical", () -> analytical(Xbtrain, ytrain, Xbtest, ytest)),
    ("Flux GD (8e-3)", () -> flux(Xtrain, ytrain, Xtest, ytest; opt = Descent(8e-3))),
    ("Landweber (1e-2)", () -> landweber(Xbtrain, ytrain, Xbtest, ytest; λ = 1e-2))
]

for (name, evalfunc) in evalvector
    errtrain, errtest = evalfunc()
    println("$name:")
    println("  RMSE (train): $errtrain")
    println("  RMSE (test):  $errtest")
end

If you comment out the line at the top and use your dataset instead of the synthetic data, you’ll see that both Flux and the manually specified Landweber iteration diverge. The learning rate needs to be 1e-8 to 1e-10 to get them to converge with high loss. And at that point, we can’t really slow down the algorithms further, since the learning rate is approaching machine epsilon.

If you use the synthetic data, you’ll see that all three approaches converge. Flux still suffers more error perhaps due to sources of numerical error in the AD (whereas for the Landweber iteration I manually wrote out the gradient).

The fact that ADAM etc get close is really a testament to how clever some of these optimization techniques are.

Topic		Replies	Views
Two questions on Flux Machine Learning	23	4745	October 2, 2020
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3068	December 18, 2019
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	686	January 19, 2024
Params not getting updated during training New to Julia flux	25	1736	October 11, 2020
Can't replicate neural network from Python's sklearn using Flux.jl General Usage flux , machine-learning , neural-network	11	1361	June 20, 2021

Generic Function to train NN w/ Flux

Related topics