Flux example 100x slower with L2 regularisation(?!)

I am running a network in Flux/ADAM without regularisation which does 100 epochs in about as many seconds.

When I add an L2-penalty of 0.01*sum(abs2 , m[2].W) to the loss function and train, it takes 90-100 seconds per epoch on exactly the same data and network.

Can anyone think of a reason this might be happening, and how I might go about fixing it?

I am running a 100x50x50x1 MLP on 8 cores (linux)

Thanks for any thoughts.

1 Like

Maybe the 0.01 makes things promote to Float64 and some slower fallback is used? But if you are running on a CPU, that might be unlikely.


Thanks. I will and will let you know. Is there another way to set \lambda?

That is the answer! I convered 0.01 to Float32 and it was off and running again!

This seems to have been worked on lately, BTW. Here’s a factor of 100 (CPU):

julia> @btime x*y  setup=(x=randn(100,100); y=randn(100,100));
  9.248 μs (2 allocations: 78.20 KiB)

julia> @btime x*y  setup=(x=randn(Float32, 100,100); y=randn(Float32, 100,100));
  6.198 μs (2 allocations: 39.14 KiB)

julia> @btime x*y  setup=(x=randn(Float32, 100,100); y=randn(100,100));
  711.042 μs (8 allocations: 78.53 KiB)

julia> VERSION

but on v1.5, it’s more like a factor of 3 for me, I guess it converts one array first.

Hi. I was running on 1.5.2