Zygote @adjoint usage for covariance matrix calculation

kadir-gunel · September 16, 2022, 8:25am

Hello,

I implemented Fréchet distance for using it as a loss function in Flux. But Zygote is not happy because it does not recognize the covariance calculation.

The documentation of Zygote package has a custom adjoint section but I don’t understand how we extend Zygote to calculate the covariance method. Could someone explain me how we implement it ?

The function (freschet distance) is described as below:

function freschet_distance(X::T, Y::T) where {T}    
    μ1 = mean(X, dims=2)
    μ2 = mean(Y, dims=2)
    
    μ = sum((μ1 - μ2).^2)
    
    σ1 = cov(X, dims=2)
    σ2 = cov(Y, dims=2)
    
    σ_mean = sqrt.(σ1 .* σ2)
    
    σ_mean = isequal(σ_mean |> typeof, ComplexF32) ? real(σ_mean) : σ_mean
    
    return μ + tr(σ1 + σ2 - 2σ_mean)
end

By the way, 2 years ago, I had a very similar problem with CUDA.zeros and CUDA.fill methods : this intrinsic must be compiled to call. And the problem was fixed just by writing one line of code for each. At that time, I just wanted to solve the problem and didn’t/couldn’t understand why they are needed.
This time, in addition to covariance definition for Zygote, my other question is : The exact same distance function is being used by PyTorch. And there is no “re-definition” of covariance for autograd in PyTorch. Then why Zygote needs this type of things ? What makes it special ?

B.R.

mcabbott · September 16, 2022, 11:14am

I think the issue is the lack of a rule for cov. Someone should write one, either from the formula or by finding another implementation to base it on.

Alternatively, for now you could use a simpler cov implementation which will be differentiable, something like mycov(x::AbstractVector; corrected::Bool=true) = sum(abs2, x .- mean(x)) / (length(x) - corrected).

Are you sure? This looks like it certainly needed some definitions.

kadir-gunel · September 16, 2022, 6:00pm

About mycov function. What is the purpose of using a boolean value and subtracting from the length? I just tried it for a 20 element array, and it gives 19 when Boole variable is true, and 20 otherwise.
Didn’t know that subtracting boolean from int is possible.

Are you sure? This looks like it certainly needed some definitions.

The implementation that I referred to is this which does not have anything like you have mentioned. Since I was just looking for some reference code, I didn’t check if the code is used for training or testing. And the code seems to be used for testing. So it is being used as a distance function and not as a loss.

So, you are right - as always :).

mcabbott · September 17, 2022, 1:07am

This is just one of the stats functions like variance where it’s standard to divide by n-1 to account for only having a sample. And since Julia’s booleans are integers, true == 1, subtracting the flag is a quick way to do it.

The linked issue now has a sketch of how to write the gradient, BTW.

kadir-gunel · September 22, 2022, 10:01am

Hello @mcabbott , sorry for the late reply.

Thank you for the link. I will try to use it.

B.R.

Topic		Replies	Views
Flux Zygote Gradient: Understanding Mutating arrays is not supported Machine Learning	21	4124	December 3, 2020
Differentiating Jacobian-vector product for sliced score matching? Machine Learning flux , zygote , ad	18	536	June 29, 2023
Why does Zygote produce a wrong derivative? Machine Learning zygote	5	979	August 29, 2020
Divergence loss term Partial Derivative Flux Zygote Machine Learning question , flux , zygote	0	255	October 28, 2023
Automatic/Black-box variational inference replication vs Autograd Machine Learning flux , zygote	9	2125	December 16, 2021

Zygote @adjoint usage for covariance matrix calculation

Related topics