Transfer Learning in Lux?

rkube · December 12, 2024, 12:13am

Hi,
I would like to transfer learn a regression model onto a classification task.

This is my regression model:

model = Chain(
    Conv((32, 16), 1 => 8),
    LayerNorm((129, 77, 8), relu, dims=(1, 2, 3)),
    Conv((32, 16), 8 => 16, stride=(2, 2)),
    LayerNorm((49, 31, 16), relu, dims=(1, 2, 3)),
    MaxPool((2, 2)), 
    Conv((16, 8), 16 => 16), 
    LayerNorm((9, 8, 16), relu, dims=(1, 2, 3)),
    MaxPool((2, 2)),  
    # Flatten and dense layers
    ReshapeLayer((256,)),
    Dropout(0.25),
    Dense(256, 128, relu),
    Dropout(0.15),
    Dense(128, 1)
)

From this, I then create a classifier by using Lux.Experimental.Freeze:

model_clf = Chain(Lux.Experimental.freeze(model[1:12]), Dense(128, 1, sigmoid))
ps_clf, st_clf = Lux.setup(rng, model_clf) |> gpu

The pullback for this model takes over a minute:

    @time (loss, st_g), pb = Zygote.pullback(ps_clf) do p 
        Y_hat, st_ = model_clf(X_batch[1:160,:,:,:], p, st)
        Y_pred = clamp.(Y_hat[1, :], ϵ, 1f0-ϵ)
        sum(Y_pred), st_
        # loss_fn(Y_true, Y_pred), st_

    end

I tried timing it and it gives me 2.3s. Then it continues compiling and after a about 1-2 minutes produces a very long:

 2.295934 seconds (110.39 k allocations: 5.660 MiB, 99.90% compilation time)
((15.384361f0, (layer_1 = (frozen_params = (layer_1 = (weight = Float32[0.015952695 0.007185958 … 0.02645357 0.057883892; -0.07097661 0.039056055 … 0.05905502 0.058483243; … ; -0.06422479 0.011287891 … 0.030074239 0.012998548; 0.04424595 -0.02495748 … -0.0018427239 0.013343713;;;; 0.04482291 -0.05909165 … 0.053648986 0.07476449; -0.03904024 0.03407839 … -0.01089952 -0.057322554; … ; -0.038908638 0.06761868 … 0.05761158 -0.018731067; -0.014519313 0.06123217 … -0.006092812 0.06165308;;;; -0.054698225 -0.047990892 … 0.059512276 -0.029810168; -0.05741847 -0.05390712 … -0.058472898 0.06503907; … ; -0.06078

I can then calculate gradients, which seem to be ok. But just the pullback takes a very long time.

What can I do to reduce calculation of the gradients to acceptable times?

avikpal · December 12, 2024, 2:44am

Timing a closure like this would lead to recompilation (see the 99.90% compilation time). @btime would account for this.

Generally I would recommend using the TrainState API since that ensures the code is written in a way to have no closures.

Regarding the performance, once the closure issue is out, you should try out Reactant.jl (Compiling Lux Models using Reactant.jl | Lux.jl Docs). It is as simple as 4 line code change but you model should be dramatically faster. 1 warning is that Dropout currently doesn’t behave the way it should in Reactant (Random Numbers & Reactant · Issue #1131 · LuxDL/Lux.jl · GitHub) but I already have an upstream PR for that so it should be resolved soon

Topic		Replies	Views
Lux.jl LSTM timeseries prediction: AD / optimization does not start New to Julia optimization , zygote , lstm , lux	2	449	February 14, 2024
Batch trainning with Lux and multiple optimizers Machine Learning sciml , lux	14	749	July 6, 2023
Timeseries model training using Lux.jl Machine Learning lstm , rnn , timeseries , lux	0	146	October 22, 2024
Improving performance for training Universal Differential Equation Machine Learning	3	359	October 17, 2023
[ANN] Lux.jl: Explicitly Parameterized Neural Networks in Julia Package Announcements package , announcement , machine-learning	50	11337	April 27, 2024

Transfer Learning in Lux?

Related topics