Weight regularisation which iterates params(m) in Flux -- Mutating arrays is not supported

ablaom · July 9, 2021, 12:36am

I’m trying to implement weight regularisation which requires adding a weight-dependent penalty to the loss function. When I proceed naively, I run into “Mutating arrays is not supported” but I don’t understand what arrays I am mutating. Here’s a MWE:

c = rand(3)
julia> gs = gradient(params(c)) do 
       sum(sum.(params(c)))
       end
ERROR: Mutating arrays is not supported

If I delete the second params the error disappears, but in my problem I don’t have explicit access to the params any other way.

Where is the mutation?
What is the workaround?

ablaom · July 9, 2021, 12:44am

Even this throws an error:

c = rand(3)
gs = gradient(params(c)) do
    params(c)[1][1]
end

ChrisRackauckas · July 9, 2021, 1:09am

Don’t call params in the gradient call.

ablaom · July 9, 2021, 3:11am

Sure @ChrisRackauckas (and thanks) but that doesn’t solve my problem, as c is not an array in my problem but some generic Flux chain. I don’t have access to the individual parameters, because responsibility for constructing c lies elsewhere. Also, I cannot see any whiff of an array mutation in what I have written here, so what’s the explanation for the error?

In my problem I have a Flux model, c say. I don’t know how it was constructed, but I can extract it’s parameters with ps = params(c). I need the parameters to compute a weight penalty. For example, for L2 regularisation, I’m doing something like this, but it doesn’t work:

lambda = 0.1 # regularization
X, Y = # some input/target training data
gs = gradient(params(c)) do
loss(x, y) = mse(c(x), y) + lambda*sum(x->sum(x.^2), params(c))
gs = Flux.gradient(params(chain)) do
    loss(X[i], y[i])
end
# and so forth

mcabbott · July 9, 2021, 3:48am

This might be Flux#1588, for which there appear to be a dozen overlapping PRs. What seems to work is not calling params inside gradient, and not using broadcasting:

julia> using Zygote, Flux

julia> c = rand(3);

julia> ps = params(c);

julia> gs = gradient(params(c)) do 
              sum(sum(x) for x in ps)
           end
Grads(...)

julia> gs[c]
3-element Fill{Float64}: entries equal to 1.0

ToucheSir · July 9, 2021, 4:32am

AIUI #1588 should’ve been fixed by Differentiate `push!` with implicit Params by DhairyaLGandhi · Pull Request #992 · FluxML/Zygote.jl · GitHub, but there’s some funkiness going on that nobody has a good grasp on yet. In general though, having params work in a gradient context seems an unfortunate accident of history and should probably not have been allowed. 99% of the time, it’s incurring extra computation and allocation unecessarily.

ablaom · July 9, 2021, 4:38am

@mcabbot Your workaround works. Thanks a million.

ablaom · July 9, 2021, 4:42am

@ToucheSir Thanks for that. I shall copy you into upcoming PR where my issue arose and perhaps you can suggest a more performant approach there, thanks.

Topic		Replies	Views
Differentiating the Custom Layer and Regularizer New to Julia flux	7	557	June 14, 2022
Mutating arrays not supported toy problem New to Julia flux	5	607	March 18, 2022
Mutating array in gradients? New to Julia flux , zygote	4	490	September 8, 2020
"Mutating arrays is not supported" error when running FluxML/vae_mnist.jl New to Julia	3	525	April 24, 2021
How to do L2 regularization with new Flux and Zygote Machine Learning flux	2	2139	December 29, 2019

Weight regularisation which iterates params(m) in Flux -- Mutating arrays is not supported

Related topics