How to apply Transfer Learning with Flux

I have a trained neural network, which is already giving good results, and I’d like to use the parameters previously obtained for the model as the starting parameters to train the ANN on a new task (basically, it’s a transfer learning problem). So far, I’m training the network by using the following function"

function flux_training(x_train::Array{Float64,2}, y_train::Array{Float64,2}, n_epochs::Int, lambda::Int)
    model = Chain(Dense(54,54,sigmoid),Dense(54,54,sigmoid),Dense(54,12,leakyrelu))
    loss(x,y) = Flux.mse(model(x),y) 
    ps = params(model)
    dataset = Flux.Data.DataLoader(x_train', y_train', batchsize = 32, shuffle = true)
    opt = Optimiser(WeightDecay(lambda), ADAGrad())
    evalcb() = @show(loss(x_train', y_train'))
    for epoch in 1:n_epochs
        println("Epoch $epoch")
        time = @elapsed Flux.train!(loss, ps, dataset, opt, cb=throttle(evalcb,3))
    end
    
    y_hat = model(x_train')' 

    return y_hat, model

end

and I save the model created by doing:

weights = params(model)
using BSON: @save
@save "mymodel.bson" weights

How can I initialize the weights in my training function as the values that were previously saved, to train the ANN for a new task?

2 Likes

I’m not sure how up-to-date it is, but there is an example of transfer learning in the Flux model zoo.

4 Likes

That example shows a nice way to freeze part of the weights and train the rest, if you just want to reload and retrain all the weights the documentation has an example.

The model weights are initialized to random values (usually, you can modify that too if you need) when the layers are constructed. If you then load new values or otherwise modify them, that will be the starting point for training.

2 Likes

Thank you!

Thank you, @contradict!! This idea of initializing the weights with a pre-defined value, instead of random values, is a nice possibility, but I’m not sure how to do that…

Flux has a function loadparams! which replaces params of an existing model. Its a bit clunky to use as you need to keep the code which creates the model structure around.

I think BSON can save the whole Chain so you dont need to do this (i.e. do BSON.@save model instead).

You could also try ONNXmutable.jl for more long term storage.

1 Like

It the initializers are not particularly useful for that since you have to specify them at layer creation time. This facility is mostly useful for experimenting with new layer types or perhaps scaling the random initialization to fit some peculiarity of your specific problem. loadparams! or one of the other methods @DrChainsaw mentioned is probably the correct solution to your problem.

1 Like