Flux example 100x slower with L2 regularisation(?!)

compleat · November 10, 2020, 1:53pm

I am running a network in Flux/ADAM without regularisation which does 100 epochs in about as many seconds.

When I add an L2-penalty of 0.01*sum(abs2 , m[2].W) to the loss function and train, it takes 90-100 seconds per epoch on exactly the same data and network.

Can anyone think of a reason this might be happening, and how I might go about fixing it?

I am running a 100x50x50x1 MLP on 8 cores (linux)

Thanks for any thoughts.

kristoffer.carlsson · November 10, 2020, 2:18pm

Maybe the 0.01 makes things promote to Float64 and some slower fallback is used? But if you are running on a CPU, that might be unlikely.

compleat · November 10, 2020, 2:20pm

Thanks. I will and will let you know. Is there another way to set \lambda?

compleat · November 10, 2020, 2:27pm

That is the answer! I convered 0.01 to Float32 and it was off and running again!
Thanks!

mcabbott · November 10, 2020, 2:42pm

This seems to have been worked on lately, BTW. Here’s a factor of 100 (CPU):

julia> @btime x*y  setup=(x=randn(100,100); y=randn(100,100));
  9.248 μs (2 allocations: 78.20 KiB)

julia> @btime x*y  setup=(x=randn(Float32, 100,100); y=randn(Float32, 100,100));
  6.198 μs (2 allocations: 39.14 KiB)

julia> @btime x*y  setup=(x=randn(Float32, 100,100); y=randn(100,100));
  711.042 μs (8 allocations: 78.53 KiB)

julia> VERSION
v"1.3.0"

but on v1.5, it’s more like a factor of 3 for me, I guess it converts one array first.

compleat · November 10, 2020, 2:51pm

Hi. I was running on 1.5.2

Topic		Replies	Views
Flux running slow? Machine Learning	16	2734	August 19, 2021
Flux multi-cpu parallelism? New to Julia question , flux , zygote	9	2931	November 21, 2020
Performance issues? New to Julia question , flux	9	1011	September 12, 2020
Flux slows down by 10x when moving from local system to high performance cluster Performance flux	10	289	September 24, 2024
My Flux Application painfully slow General Usage question	21	1494	October 20, 2020

Flux example 100x slower with L2 regularisation(?!)

Related topics