I’m trying to implement weight regularisation which requires adding a weight-dependent penalty to the loss function. When I proceed naively, I run into “Mutating arrays is not supported” but I don’t understand what arrays I am mutating. Here’s a MWE:
c = rand(3)
julia> gs = gradient(params(c)) do
sum(sum.(params(c)))
end
ERROR: Mutating arrays is not supported
If I delete the second params the error disappears, but in my problem I don’t have explicit access to the params any other way.
Sure @ChrisRackauckas (and thanks) but that doesn’t solve my problem, as c is not an array in my problem but some generic Flux chain. I don’t have access to the individual parameters, because responsibility for constructing c lies elsewhere. Also, I cannot see any whiff of an array mutation in what I have written here, so what’s the explanation for the error?
In my problem I have a Flux model, c say. I don’t know how it was constructed, but I can extract it’s parameters with ps = params(c). I need the parameters to compute a weight penalty. For example, for L2 regularisation, I’m doing something like this, but it doesn’t work:
lambda = 0.1 # regularization
X, Y = # some input/target training data
gs = gradient(params(c)) do
loss(x, y) = mse(c(x), y) + lambda*sum(x->sum(x.^2), params(c))
gs = Flux.gradient(params(chain)) do
loss(X[i], y[i])
end
# and so forth
This might be Flux#1588, for which there appear to be a dozen overlapping PRs. What seems to work is not calling params inside gradient, and not using broadcasting:
julia> using Zygote, Flux
julia> c = rand(3);
julia> ps = params(c);
julia> gs = gradient(params(c)) do
sum(sum(x) for x in ps)
end
Grads(...)
julia> gs[c]
3-element Fill{Float64}: entries equal to 1.0
AIUI #1588 should’ve been fixed by Differentiate `push!` with implicit Params by DhairyaLGandhi · Pull Request #992 · FluxML/Zygote.jl · GitHub, but there’s some funkiness going on that nobody has a good grasp on yet. In general though, having params work in a gradient context seems an unfortunate accident of history and should probably not have been allowed. 99% of the time, it’s incurring extra computation and allocation unecessarily.
@ToucheSir Thanks for that. I shall copy you into upcoming PR where my issue arose and perhaps you can suggest a more performant approach there, thanks.