Cheers,
I am struggling to pinpoint why a certain FCN with less than 1M parameters trains much slower than another one with 30M parameters. Then, I realized that, on every epoch, the train loop showed up around 90% of total execution time was spent on compilation!
Then, I ran the below code with simple model at several computers. Code is as follows:
using Flux
model = Flux.Conv((1,1), 3=>1)loss(yhat, y) = Flux.mse(yhat, y)
opt = Flux.Adam()
optstate = Flux.setup(opt, model)X = rand(Float32, (128,128,3,1))
s = size(model(X))
Y = rand(Bool, s)
data = Flux.DataLoader((X,Y); batchsize=1)@time Flux.train!(model, data, optstate) do m,x,y
loss(m(x), y)
end
The results on two different CPUs, Ubuntu OS, no GPU, were as follows:
WSL Corei7 CPU
12.385664 seconds (16.22 M allocations: 1.038 GiB, 3.96% gc time, 99.97% compilation time)
0.142981 seconds (331.26 k allocations: 22.813 MiB, 98.65% compilation time)
0.150819 seconds (331.25 k allocations: 22.782 MiB, 4.89% gc time, 98.80% compilation time)
0.142561 seconds (331.26 k allocations: 22.765 MiB, 98.55% compilation time)
0.147581 seconds (331.26 k allocations: 22.772 MiB, 4.23% gc time, 98.46% compilation time)
Ubuntu ARM Ampere CPU
16.378631 seconds (16.22 M allocations: 1.036 GiB, 3.62% gc time, 99.92% compilation time)
0.267535 seconds (331.26 k allocations: 21.639 MiB, 98.50% compilation time)
0.231993 seconds (331.25 k allocations: 21.656 MiB, 5.69% gc time, 98.35% compilation time)
0.216757 seconds (331.26 k allocations: 21.643 MiB, 98.35% compilation time)
0.226803 seconds (331.26 k allocations: 21.648 MiB, 4.76% gc time, 98.53% compilation time)
On both cases, first execution of train! was slower as expected. The next ones were much faster. However, with unexpectedly high compilation percentage.
I wonder if this is a bug in Julia?
Thanks in advance.