Regularization with Flux

Mariana · October 29, 2020, 1:10pm

I’m currently training an ANN with 54 inputs and 12 outputs and I have already achieved good results by using the following model:

model = Chain(Dense(54,54,sigmoid), Dense(54,54,sigmoid), Dense(54,12,leakyrelu)).

However, I’m trying to apply regularization, in order to improve my results. I’m currently using the mse loss function. I tried to implement regularization by doing:

opt = Optimiser(WeightDecay(lambda), ADAGrad()),

and I set lambda=1, but I’m not getting better results. Any ideias about how I could implement regularization?

Here’s my code for the ANN training:

function flux_training(x_train::Array{Float64,2}, y_train::Array{Float64,2}, n_epochs::Int, lambda::Int)
    model = Chain(Dense(54,54,sigmoid),Dense(54,54,sigmoid),Dense(54,12,leakyrelu))
    loss(x,y) = Flux.mse(model(x),y) 
    ps = params(model)
    dataset = Flux.Data.DataLoader(x_train', y_train', batchsize = 32, shuffle = true)
    opt = Optimiser(WeightDecay(lambda), ADAGrad())
    evalcb() = @show(loss(x_train', y_train'))
    for epoch in 1:n_epochs
        println("Epoch $epoch")
        time = @elapsed Flux.train!(loss, ps, dataset, opt, cb=throttle(evalcb,3))
    end
    
    y_hat = model(x_train')' 

    return y_hat, model

end

baggepinnen · October 29, 2020, 2:24pm

There is no guarantee that a regularizer like weight decay will improve your results. First of all, it probably needs some tuning of the lambda, second, your network is already quite small, and it may thus not suffer from overfitting, but erhaps from underfitting, in which case the regularizer might make it worse.

How much data do you have? Have you tried dropout, it’s usually a better bet than L2 weight decay in my experience. Does performance go up or down if you increase the capacity/size of the model?

Mariana · October 29, 2020, 4:55pm

Thanks for the answer! Each input of my training set has around 57000 elements. Indeed, my network is small, so I guess I won’t need to apply regularization after all (I tried different values for lambda, but couldn’t improve my results). I calculated some metrics for the out-of-sample results and obtained, at the worst case, a MAPE of 1.4%, so I think the network is already in a good fit. But I didn’t know that regularization could make the results worse, so thanks for the explanation!

baggepinnen · October 29, 2020, 5:09pm

Weight decay is an addition to the cost function saying that the weights should be small in the L_2 sense, this can be at odds with having weights that fit the data well. In some situations, there is a reason to believe that small model parameters are better than large, but you can easily imagine that if you let lambda go to infinity, your weights will go to zero and zero weights does not give you a good model.

57k training samples sounds like a good amount, I would try to make the model larger until the validation error goes up, and then you might have found a sweetspot without getting too complicated.

Topic		Replies	Views
M. learning with regularization using Flux is too slow? Performance question , flux	5	421	February 9, 2024
How do I regularise MLJFlux models? Machine Learning mlj	1	347	June 6, 2022
Generic Function to train NN w/ Flux Machine Learning flux	7	1646	April 14, 2020
Flux normalization and regularisation Machine Learning flux	7	1947	May 3, 2022
Problems with Flux NN regression Machine Learning question , package	1	410	November 19, 2021

Regularization with Flux

Related topics