Mutating array in gradients?

I’m a Julia newbie trying to switch over from Python! I’m trying to implement a bayes neural net. for some practice.

When trying to track gradients in my model I’m getting the following error: “Mutating arrays is not supported”. I think there’s something I’ve misunderstood with Zygote… Where in my model am I mutating arrays?

struct VariationalAffine{F<:Function,S<:AbstractArray,T<:AbstractArray,
						 U<:Array, V<:Array,I<:Integer}
	W::S
	logσW::S
	b::T
	logσb::T
	Wsample::U
	bsample::V
	σ::F
	η::I
end
function VariationalAffine(in::Integer, out::Integer, η::Integer,σ=identity)
		W = glorot_uniform(out,in) .+ 0.0
		logσW = -6.0 .+ glorot_uniform(out,in)#randn(out,in)
		b = zeros(out)
		logσb = -6.0 .+ glorot_uniform(out)
		return VariationalAffine(W, logσW, b, logσb, reparamSample(W,logσW,η),
		reparamSample(b,logσb,η), σ, η)
end
function (m::VariationalAffine)(x::Array)::Array
	m.Wsample .= reparamSample(m.W, m.logσW, m.η)
	m.bsample .= reparamSample(m.b, m.logσb, m.η)
	linear = m.Wsample .* x
	return  σArray(((li, bi)-> li.+bi).(linear,m.bsample),m.σ)
end
""" 
Draw a sample from weights using reparameterization trick
"""
function reparamSample(A::Array{<:Number}, logσA::Array{<:Number}, η::Integer)
	return [A .+ randn(size(A)...) .* exp.(logσA) for i in 1:η]
end
"""
Apply activation function to array of arrays
"""
function σArray(A::Array, σ)::Array
	return (Ai->σ.(Ai)).(A)
end

struct BayesNN{I<:Integer,L<:Array,P<:Flux.Zygote.Params}
	in::I
	out::I
	layers::L
	θ::P
end
function BayesNN(in::Integer, out::Integer, num_layers::Integer, 
				 num_hidden::Integer, η::Integer, σ=relu)
	# putting layers into array
	layers = [VariationalAffine(in, num_hidden, η, σ),]
	hidden_layers = [VariationalAffine(num_hidden, num_hidden, η, σ)
				     for i in 1:(num_layers-1)]
	append!(layers, hidden_layers)
	append!(layers, [VariationalAffine(num_hidden, out, η, σ),])
	# collecting into parameter array 
	P=[layers[1].W, layers[1].b]
	(L -> append!(P, [L.W, L.b])).(layers[2:end])
	return BayesNN(in, out, layers, Flux.params(P))
end
(bm::BayesNN)(x) = foldl((x,bm)->bm(x), bm.layers, init=x)
"""
Mean squared loss for BayesNN
"""
function bnnLoss(y,ŷ)
	sqrdiff = (d -> (d).^2).(ŷ .- [y])
	return mean(mean.(sqrdiff))
end

bnn=BayesNN(in, out, 10, 50, η);

loss(x,y) = bnnLoss(y, bnn([x’])')

gradient(()->loss(x,y), bnn.θ)

.= mutates an array. Replacing this with a regular = might make this work (might not though, I haven’t used Flux in quite some time).

If you don’t mind a note on the code, (x->f.(x)).(y) is really hard to read. Usually, map and foreach can serve this purpose in a readable way. Consider:

function bnnLoss(y,ŷ)
    sqrdiff = map(yh->(yh .- y).^2, ŷ) 
    return mean(mean.(sqrdiff))
end

# you could also do this, which is pretty neat.
# the inner mean is taking a function as its first 
# argument, which it applies to each element before 
# `mean`ing it
function bnnLoss(y,ŷ)
    mean(mean(yh->(yh.-y).^2, ŷ))
end

Arguably more egregious however, is that newcomers to julia tend to “over type” their code, usually because they think it affects performance, but sometimes for other reasons. It is usually best to keep code pretty general (even “under typed”) for better reusability and readability. For example, consider if reparamSample were suddenly to be applied to a SparseArray; it would error because it is typed for an Array. It would have been better not to provide a type at all, then.

Thanks for the response! Unfortunately I wanted to keep track of Wsample within the variationalLinear struct and so changing “.=” to “=” didn’t work in this case.

I appreciate the notes on style! I’ve made the updates to make my code more readable.

And you’re right that I was trying to type everything as much as possible for a hope that it would gift me some performance. Thanks for pointing out that is not required.

From here, I believe the correct way to do it is to make the struct mutable and “swap out the arrays” after each application of the layer. I based that on how RNNCell was treating h, but that is not an array. On second read through, perhaps the right thing is to make VariationalAffine a more “cell”-like layer (like LSTMCell and RNNCell) and wrap that in a Recur to keep track of the state.

This is a common early misconception. The key to performance is type-stability

.

3 Likes

Thank you, I seriously appreciate you taking the time to help! Your proposed solution makes sense – I’ll make the changes. Take care

2 Likes