Error in Flux when calculating the gradient due to an in-place operation

josemanuel22 · June 18, 2023, 10:44pm

I’m diving into Flux.jl for my research. My objective is to define a loss function that would measure, so to speak, the convergence of the model towards a uniform data distribution.

I generate K random fictitious observations and compare how many of them are smaller than the true data in the training set. In other words, the model generates K simulations, and we determine the number of simulations where the generated data is smaller than the actual data. If the model is well trained, this distribution should converge to a uniform distribution.

However, since the resulting histogram is not differentiable, I needed to approximate this idea using compositions of continuous differentiable functions.

η = 0.1; num_epochs = 100; n_samples = 1000
optim = Flux.setup(Flux.Adam(η), model)
losses = []
l = CustomLoss(2)
for epoch in 1:num_epochs
    loss, grads = Flux.withgradient(model) do m
        aₖ = zeros(l.K+1)
        for _ in 1:n_samples
            x = rand(Normal(μ, stddev), l.K)
            yₖ = m(x')
            y = target(rand(Normal(μ, stddev)))
           aₖ += generate_aₖ(l, yₖ, y)
        end
        scalar_diff(l, aₖ ./ sum(aₖ))
    end
    Flux.update!(optim, model, grads[1])
    push!(losses, loss)
end

So, generate_aₖ(l, yₖ, y) would return a vector of size K+1, where the element at index k is close to 1 if y is greater than k elements of the vector yₖ.

Finally, scalar_diff calculates the squared difference between the calculated aₖ and the uniform distribution.

However, this code is not functioning correctly, and I am encountering the following issue due to the line aₖ += generate_aₖ(l, yₖ, y),

Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

I’m not sure how to solve this problem. Does anyone have any ideas for a workaround? I need to differentiate it against aₖ .

tchebycheff · June 19, 2023, 5:32am

What do target() and CustomLoss() do? What is the objective of the model?

gdalle · June 19, 2023, 5:39am

I think we mostly need to know what generate_ak does

josemanuel22 · June 19, 2023, 7:56am

Hello! Target is nothing more than the model I want to learn. In this case, it’s the simple line

line(x, m, b) = m * x + b
m = 3; b = 5
target(x) = line(x, m=m, b=b)

This is an example of a toy example, obviously, afterwards it will be the real observation.

CustomLoss is nothing more than a structure that contains the hyperparameter K , which represents the number of fictitious samples generated at each iteration. It can be seen as a parameter that measures the required resolution.

josemanuel22 · June 19, 2023, 8:23am

I have implemented generate_ak in the following way.

struct CustomLoss
    K::Int

    function CustomLoss(K::Int)
        new(K)
    end
end

function generate_aₖ(loss::CustomLoss, ŷ, y)
    aₖ = zeros(loss.K+1)
    @inbounds for k in 0:loss.K
        aₖ .+= γ(ŷ, y, k, loss.K+1)
    end
    return aₖ
end

scalar_diff = (loss::CustomLoss, a_k) -> sum((a_k .- (1 ./ (loss.K + 1))) .^2)
jensen_shannon_∇ = (loss::CustomLoss, a_k) -> jensen_shannon_divergence(a_k, fill(1 / (loss.K + 1), 1, loss.K + 1))

function jensen_shannon_divergence(p,q)
    ϵ = 1e-3
    p.+=ϵ; q.+=ϵ;
    return 0.5 * (kldivergence(p,q) + kldivergence(q,p))
end

function sigmoid(ŷ, y)
    return σ.((ŷ-y)*10)
end

function ψₘ(y, m)
    stddev = 0.1
    return exp.((-0.5 .* ((y .- m) ./ stddev) .^ 2))
end

function ϕ(yₖ, yₙ)
    return sum(sigmoid.(yₙ, yₖ))
end

function γ(yₖ, yₙ, m, K)
    eₘ = (n, m) -> [j == m ? 1.0 : 0.0 for j in 0:n-1]
    return eₘ(K, m) * ψₘ(ϕ(yₖ, yₙ), m)
end

As I mentioned, I need to transform the concept of counting (histogram) into a differentiable operation. To do this, I have done the following. First, I have used a sigmoid operation to check if a fictitious observation (out of the K generated) is smaller than the real observation. Obviously, this process can be repeated for the K observations by simply summing them (using the ϕ function). After this, I want to generate differentiable histogram bins. To achieve this, it is sufficient to use a bump function (I tried modifiers, but in the end, when running this example in PyTorch, I found that a normalized Gaussian worked better) that sums to nearly 1 when the real observation is greater than the K fictitious observations, and it is zero otherwise. We will have K+1 bump functions, each centered at “1, 2, …”. With the gamma (γ) function, I simply try to associate the previous result with the vector component i , which will represent my histogram bins. Finally, generated_ak is nothing more than an interaction over all the vector components to generate each of the bins.

It’s a bit convoluted, I’m sorry. It’s just that explaining the idea is not trivial. If you have any questions, please let me know. I’m trying to make this work in Julia because I’ve seen that there can be a substantial improvement in execution times and also because of the ease of parallelization it offers, which Python lacks.

gdalle · June 19, 2023, 8:55am

From what I understand, there are two separate problems:

One is mathematical: differentiating through a random sample. Indeed, this requires either reparametrization or reinforce gradients.
One is practical: differentiating through a mutating function with Zygote.jl. The line I quoted above is presumably the one that breaks Flux.jl

josemanuel22 · June 19, 2023, 9:00am

Thank you very much. Regarding the second issue, do you think it could be solved using Zygote.Buffer ? I’m a bit of a beginner with Flux, so I’m not sure.

tchebycheff · June 19, 2023, 9:12am

This should work.

generate_aₖ(loss, ŷ, y) = [γ(ŷ, y, k, loss.K+1) for k in 0:loss.K]

But, it is still not clear to me. If we want to check distribution of model(x) is uniform, why not use something like moment matching?

Topic		Replies	Views
Tell Flux not to differentiate something Machine Learning flux	8	730	March 2, 2021
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1324	January 13, 2021
Gradients of custom functions with Flux Machine Learning flux	2	1185	April 5, 2020
Flux Custom Loss Function Not Working Properly Machine Learning flux , zygote	20	2243	April 2, 2021
Help with understanding Mutating arrays error in a Flux model General Usage flux , zygote , diffeqflux	3	441	March 23, 2022

Error in Flux when calculating the gradient due to an in-place operation

Related topics