# Error in Flux when calculating the gradient due to an in-place operation

I’m diving into `Flux.jl` for my research. My objective is to define a loss function that would measure, so to speak, the convergence of the model towards a uniform data distribution.

I generate `K` random fictitious observations and compare how many of them are smaller than the true data in the training set. In other words, the model generates `K` simulations, and we determine the number of simulations where the generated data is smaller than the actual data. If the model is well trained, this distribution should converge to a uniform distribution.

However, since the resulting histogram is not differentiable, I needed to approximate this idea using compositions of continuous differentiable functions.

``````η = 0.1; num_epochs = 100; n_samples = 1000
losses = []
l = CustomLoss(2)
for epoch in 1:num_epochs
aₖ = zeros(l.K+1)
for _ in 1:n_samples
x = rand(Normal(μ, stddev), l.K)
yₖ = m(x')
y = target(rand(Normal(μ, stddev)))
aₖ += generate_aₖ(l, yₖ, y)
end
scalar_diff(l, aₖ ./ sum(aₖ))
end
push!(losses, loss)
end
``````

So, `generate_aₖ(l, yₖ, y)` would return a vector of size `K+1`, where the element at index `k` is close to `1` if `y` is greater than `k` elements of the vector `yₖ`.

Finally, `scalar_diff` calculates the squared difference between the calculated `aₖ` and the uniform distribution.

However, this code is not functioning correctly, and I am encountering the following issue due to the line `aₖ += generate_aₖ(l, yₖ, y)`,

``````Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)
``````

I’m not sure how to solve this problem. Does anyone have any ideas for a workaround? I need to differentiate it against `aₖ` .

What do `target()` and `CustomLoss()` do? What is the objective of the model?

I think we mostly need to know what `generate_ak` does

Hello! Target is nothing more than the model I want to learn. In this case, it’s the simple line

``````line(x, m, b) = m * x + b
m = 3; b = 5
target(x) = line(x, m=m, b=b)
``````

This is an example of a toy example, obviously, afterwards it will be the real observation.

`CustomLoss` is nothing more than a structure that contains the hyperparameter `K` , which represents the number of fictitious samples generated at each iteration. It can be seen as a parameter that measures the required resolution.

I have implemented `generate_ak` in the following way.

``````struct CustomLoss
K::Int

function CustomLoss(K::Int)
new(K)
end
end

function generate_aₖ(loss::CustomLoss, ŷ, y)
aₖ = zeros(loss.K+1)
@inbounds for k in 0:loss.K
aₖ .+= γ(ŷ, y, k, loss.K+1)
end
return aₖ
end

scalar_diff = (loss::CustomLoss, a_k) -> sum((a_k .- (1 ./ (loss.K + 1))) .^2)
jensen_shannon_∇ = (loss::CustomLoss, a_k) -> jensen_shannon_divergence(a_k, fill(1 / (loss.K + 1), 1, loss.K + 1))

function jensen_shannon_divergence(p,q)
ϵ = 1e-3
p.+=ϵ; q.+=ϵ;
return 0.5 * (kldivergence(p,q) + kldivergence(q,p))
end

function sigmoid(ŷ, y)
return σ.((ŷ-y)*10)
end

function ψₘ(y, m)
stddev = 0.1
return exp.((-0.5 .* ((y .- m) ./ stddev) .^ 2))
end

function ϕ(yₖ, yₙ)
return sum(sigmoid.(yₙ, yₖ))
end

function γ(yₖ, yₙ, m, K)
eₘ = (n, m) -> [j == m ? 1.0 : 0.0 for j in 0:n-1]
return eₘ(K, m) * ψₘ(ϕ(yₖ, yₙ), m)
end
``````

As I mentioned, I need to transform the concept of counting (histogram) into a differentiable operation. To do this, I have done the following. First, I have used a sigmoid operation to check if a fictitious observation (out of the K generated) is smaller than the real observation. Obviously, this process can be repeated for the K observations by simply summing them (using the ϕ function). After this, I want to generate differentiable histogram bins. To achieve this, it is sufficient to use a bump function (I tried modifiers, but in the end, when running this example in PyTorch, I found that a normalized Gaussian worked better) that sums to nearly `1` when the real observation is greater than the K fictitious observations, and it is zero otherwise. We will have K+1 bump functions, each centered at “1, 2, …”. With the gamma (γ) function, I simply try to associate the previous result with the vector component `i` , which will represent my histogram bins. Finally, `generated_ak` is nothing more than an interaction over all the vector components to generate each of the bins.

It’s a bit convoluted, I’m sorry. It’s just that explaining the idea is not trivial. If you have any questions, please let me know. I’m trying to make this work in Julia because I’ve seen that there can be a substantial improvement in execution times and also because of the ease of parallelization it offers, which Python lacks.

From what I understand, there are two separate problems:

• One is mathematical: differentiating through a random sample. Indeed, this requires either reparametrization or reinforce gradients.
• One is practical: differentiating through a mutating function with Zygote.jl. The line I quoted above is presumably the one that breaks Flux.jl
1 Like

Thank you very much. Regarding the second issue, do you think it could be solved using `Zygote.Buffer` ? I’m a bit of a beginner with Flux, so I’m not sure.

This should work.

``````generate_aₖ(loss, ŷ, y) = [γ(ŷ, y, k, loss.K+1) for k in 0:loss.K]
``````

But, it is still not clear to me. If we want to check distribution of `model(x)` is uniform, why not use something like moment matching?