Weight regularisation which iterates params(m) in Flux -- Mutating arrays is not supported

I’m trying to implement weight regularisation which requires adding a weight-dependent penalty to the loss function. When I proceed naively, I run into “Mutating arrays is not supported” but I don’t understand what arrays I am mutating. Here’s a MWE:

c = rand(3)
julia> gs = gradient(params(c)) do 
ERROR: Mutating arrays is not supported

If I delete the second params the error disappears, but in my problem I don’t have explicit access to the params any other way.

Where is the mutation?
What is the workaround?

Even this throws an error:

c = rand(3)
gs = gradient(params(c)) do

Don’t call params in the gradient call.

1 Like

Sure @ChrisRackauckas (and thanks) but that doesn’t solve my problem, as c is not an array in my problem but some generic Flux chain. I don’t have access to the individual parameters, because responsibility for constructing c lies elsewhere. Also, I cannot see any whiff of an array mutation in what I have written here, so what’s the explanation for the error?

In my problem I have a Flux model, c say. I don’t know how it was constructed, but I can extract it’s parameters with ps = params(c). I need the parameters to compute a weight penalty. For example, for L2 regularisation, I’m doing something like this, but it doesn’t work:

lambda = 0.1 # regularization
X, Y = # some input/target training data
gs = gradient(params(c)) do
loss(x, y) = mse(c(x), y) + lambda*sum(x->sum(x.^2), params(c))
gs = Flux.gradient(params(chain)) do
    loss(X[i], y[i])
# and so forth

This might be Flux#1588, for which there appear to be a dozen overlapping PRs. What seems to work is not calling params inside gradient, and not using broadcasting:

julia> using Zygote, Flux

julia> c = rand(3);

julia> ps = params(c);

julia> gs = gradient(params(c)) do 
              sum(sum(x) for x in ps)

julia> gs[c]
3-element Fill{Float64}: entries equal to 1.0

AIUI #1588 should’ve been fixed by Differentiate `push!` with implicit Params by DhairyaLGandhi · Pull Request #992 · FluxML/Zygote.jl · GitHub, but there’s some funkiness going on that nobody has a good grasp on yet. In general though, having params work in a gradient context seems an unfortunate accident of history and should probably not have been allowed. 99% of the time, it’s incurring extra computation and allocation unecessarily.

@mcabbot Your workaround works. Thanks a million.

@ToucheSir Thanks for that. I shall copy you into upcoming PR where my issue arose and perhaps you can suggest a more performant approach there, thanks.