I am running a network in Flux/ADAM without regularisation which does 100 epochs in about as many seconds.
When I add an L2-penalty of 0.01*sum(abs2 , m[2].W) to the loss function and train, it takes 90-100 seconds per epoch on exactly the same data and network.
Can anyone think of a reason this might be happening, and how I might go about fixing it?