Zygote @adjoint usage for covariance matrix calculation


I implemented Fréchet distance for using it as a loss function in Flux. But Zygote is not happy because it does not recognize the covariance calculation.

The documentation of Zygote package has a custom adjoint section but I don’t understand how we extend Zygote to calculate the covariance method. Could someone explain me how we implement it ?

The function (freschet distance) is described as below:

function freschet_distance(X::T, Y::T) where {T}    
    μ1 = mean(X, dims=2)
    μ2 = mean(Y, dims=2)
    μ = sum((μ1 - μ2).^2)
    σ1 = cov(X, dims=2)
    σ2 = cov(Y, dims=2)
    σ_mean = sqrt.(σ1 .* σ2)
    σ_mean = isequal(σ_mean |> typeof, ComplexF32) ? real(σ_mean) : σ_mean
    return μ + tr(σ1 + σ2 - 2σ_mean)

By the way, 2 years ago, I had a very similar problem with CUDA.zeros and CUDA.fill methods : this intrinsic must be compiled to call. And the problem was fixed just by writing one line of code for each. At that time, I just wanted to solve the problem and didn’t/couldn’t understand why they are needed.
This time, in addition to covariance definition for Zygote, my other question is : The exact same distance function is being used by PyTorch. And there is no “re-definition” of covariance for autograd in PyTorch. Then why Zygote needs this type of things ? What makes it special ?


I think the issue is the lack of a rule for cov. Someone should write one, either from the formula or by finding another implementation to base it on.

Alternatively, for now you could use a simpler cov implementation which will be differentiable, something like mycov(x::AbstractVector; corrected::Bool=true) = sum(abs2, x .- mean(x)) / (length(x) - corrected).

Are you sure? This looks like it certainly needed some definitions.

About mycov function. What is the purpose of using a boolean value and subtracting from the length? I just tried it for a 20 element array, and it gives 19 when Boole variable is true, and 20 otherwise.
Didn’t know that subtracting boolean from int is possible.

Are you sure? This looks like it certainly needed some definitions.

The implementation that I referred to is this which does not have anything like you have mentioned. Since I was just looking for some reference code, I didn’t check if the code is used for training or testing. And the code seems to be used for testing. So it is being used as a distance function and not as a loss.

So, you are right - as always :).

This is just one of the stats functions like variance where it’s standard to divide by n-1 to account for only having a sample. And since Julia’s booleans are integers, true == 1, subtracting the flag is a quick way to do it.

The linked issue now has a sketch of how to write the gradient, BTW.

Hello @mcabbott , sorry for the late reply.

Thank you for the link. I will try to use it.