Mutating array in gradients?

I’m a Julia newbie trying to switch over from Python! I’m trying to implement a bayes neural net. for some practice.

When trying to track gradients in my model I’m getting the following error: “Mutating arrays is not supported”. I think there’s something I’ve misunderstood with Zygote… Where in my model am I mutating arrays?

struct VariationalAffine{F<:Function,S<:AbstractArray,T<:AbstractArray,
						 U<:Array, V<:Array,I<:Integer}
function VariationalAffine(in::Integer, out::Integer, η::Integer,σ=identity)
		W = glorot_uniform(out,in) .+ 0.0
		logσW = -6.0 .+ glorot_uniform(out,in)#randn(out,in)
		b = zeros(out)
		logσb = -6.0 .+ glorot_uniform(out)
		return VariationalAffine(W, logσW, b, logσb, reparamSample(W,logσW,η),
		reparamSample(b,logσb,η), σ, η)
function (m::VariationalAffine)(x::Array)::Array
	m.Wsample .= reparamSample(m.W, m.logσW, m.η)
	m.bsample .= reparamSample(m.b, m.logσb, m.η)
	linear = m.Wsample .* x
	return  σArray(((li, bi)-> li.+bi).(linear,m.bsample),m.σ)
Draw a sample from weights using reparameterization trick
function reparamSample(A::Array{<:Number}, logσA::Array{<:Number}, η::Integer)
	return [A .+ randn(size(A)...) .* exp.(logσA) for i in 1:η]
Apply activation function to array of arrays
function σArray(A::Array, σ)::Array
	return (Ai->σ.(Ai)).(A)

struct BayesNN{I<:Integer,L<:Array,P<:Flux.Zygote.Params}
function BayesNN(in::Integer, out::Integer, num_layers::Integer, 
				 num_hidden::Integer, η::Integer, σ=relu)
	# putting layers into array
	layers = [VariationalAffine(in, num_hidden, η, σ),]
	hidden_layers = [VariationalAffine(num_hidden, num_hidden, η, σ)
				     for i in 1:(num_layers-1)]
	append!(layers, hidden_layers)
	append!(layers, [VariationalAffine(num_hidden, out, η, σ),])
	# collecting into parameter array 
	P=[layers[1].W, layers[1].b]
	(L -> append!(P, [L.W, L.b])).(layers[2:end])
	return BayesNN(in, out, layers, Flux.params(P))
(bm::BayesNN)(x) = foldl((x,bm)->bm(x), bm.layers, init=x)
Mean squared loss for BayesNN
function bnnLoss(y,ŷ)
	sqrdiff = (d -> (d).^2).(ŷ .- [y])
	return mean(mean.(sqrdiff))

bnn=BayesNN(in, out, 10, 50, η);

loss(x,y) = bnnLoss(y, bnn([x’])’)

gradient(()->loss(x,y), bnn.θ)

.= mutates an array. Replacing this with a regular = might make this work (might not though, I haven’t used Flux in quite some time).

If you don’t mind a note on the code, (x->f.(x)).(y) is really hard to read. Usually, map and foreach can serve this purpose in a readable way. Consider:

function bnnLoss(y,ŷ)
    sqrdiff = map(yh->(yh .- y).^2, ŷ) 
    return mean(mean.(sqrdiff))

# you could also do this, which is pretty neat.
# the inner mean is taking a function as its first 
# argument, which it applies to each element before 
# `mean`ing it
function bnnLoss(y,ŷ)
    mean(mean(yh->(yh.-y).^2, ŷ))

Arguably more egregious however, is that newcomers to julia tend to “over type” their code, usually because they think it affects performance, but sometimes for other reasons. It is usually best to keep code pretty general (even “under typed”) for better reusability and readability. For example, consider if reparamSample were suddenly to be applied to a SparseArray; it would error because it is typed for an Array. It would have been better not to provide a type at all, then.

Thanks for the response! Unfortunately I wanted to keep track of Wsample within the variationalLinear struct and so changing “.=” to “=” didn’t work in this case.

I appreciate the notes on style! I’ve made the updates to make my code more readable.

And you’re right that I was trying to type everything as much as possible for a hope that it would gift me some performance. Thanks for pointing out that is not required.

From here, I believe the correct way to do it is to make the struct mutable and “swap out the arrays” after each application of the layer. I based that on how RNNCell was treating h, but that is not an array. On second read through, perhaps the right thing is to make VariationalAffine a more “cell”-like layer (like LSTMCell and RNNCell) and wrap that in a Recur to keep track of the state.

This is a common early misconception. The key to performance is type-stability



Thank you, I seriously appreciate you taking the time to help! Your proposed solution makes sense – I’ll make the changes. Take care