Hi,
I would like to transfer learn a regression model onto a classification task.
This is my regression model:
model = Chain(
Conv((32, 16), 1 => 8),
LayerNorm((129, 77, 8), relu, dims=(1, 2, 3)),
Conv((32, 16), 8 => 16, stride=(2, 2)),
LayerNorm((49, 31, 16), relu, dims=(1, 2, 3)),
MaxPool((2, 2)),
Conv((16, 8), 16 => 16),
LayerNorm((9, 8, 16), relu, dims=(1, 2, 3)),
MaxPool((2, 2)),
# Flatten and dense layers
ReshapeLayer((256,)),
Dropout(0.25),
Dense(256, 128, relu),
Dropout(0.15),
Dense(128, 1)
)
From this, I then create a classifier by using Lux.Experimental.Freeze:
model_clf = Chain(Lux.Experimental.freeze(model[1:12]), Dense(128, 1, sigmoid))
ps_clf, st_clf = Lux.setup(rng, model_clf) |> gpu
The pullback for this model takes over a minute:
@time (loss, st_g), pb = Zygote.pullback(ps_clf) do p
Y_hat, st_ = model_clf(X_batch[1:160,:,:,:], p, st)
Y_pred = clamp.(Y_hat[1, :], ϵ, 1f0-ϵ)
sum(Y_pred), st_
# loss_fn(Y_true, Y_pred), st_
end
I tried timing it and it gives me 2.3s. Then it continues compiling and after a about 1-2 minutes produces a very long:
2.295934 seconds (110.39 k allocations: 5.660 MiB, 99.90% compilation time)
((15.384361f0, (layer_1 = (frozen_params = (layer_1 = (weight = Float32[0.015952695 0.007185958 … 0.02645357 0.057883892; -0.07097661 0.039056055 … 0.05905502 0.058483243; … ; -0.06422479 0.011287891 … 0.030074239 0.012998548; 0.04424595 -0.02495748 … -0.0018427239 0.013343713;;;; 0.04482291 -0.05909165 … 0.053648986 0.07476449; -0.03904024 0.03407839 … -0.01089952 -0.057322554; … ; -0.038908638 0.06761868 … 0.05761158 -0.018731067; -0.014519313 0.06123217 … -0.006092812 0.06165308;;;; -0.054698225 -0.047990892 … 0.059512276 -0.029810168; -0.05741847 -0.05390712 … -0.058472898 0.06503907; … ; -0.06078
I can then calculate gradients, which seem to be ok. But just the pullback takes a very long time.
What can I do to reduce calculation of the gradients to acceptable times?