Question about mutual information calculation

I came across this article which points out that
A(x,y) = sqrt(1 - exp(-2 * MI(x,y)))
(where MI(x,y) is the mutual information shared by data samples x and y) is a very nice measure of association between x and y.

One of the nice features that A(x,y) has is that A(x,y) = pearson correlation(x,y) when x and y are truly linearly related, e.g. if

n = 10_0000; 
rho = .3; 
x = randn(n); 
z = randn(n); 
y = rho * x + sqrt(1-rho^2)*z;

then, supposedly A(x,y) is approximately equal to rho.

The article uses R and the package FNN which bins the data using the method described in A. Kraskov, H. Stogbauer and P.Grassberger
(2004). Unfortunately the source is a mix of C and R and looks pretty impenetrable at first glance.

I tried to implement A(x,y) in Julia and my implementation failed its very first sanity check: I failed to reproduce the A(x,y) = rho feature for data that’s truly linearly correlated.

I first tried computing MI(x,y) with TransferEntropy.mutualinfo(x,y,Kraskov1(2)) and variations (different cluster size inputs to Kraskov1 and Kraskov2, etc.). I thought this should be the same method and result as the R package, but it doesn’t seem to be close.

I’ve also tried InformationMeasures.get_mutual_information(x,y) but it also fails to reproduce the result.

Can anyone provide guidance here? Am I calling the mutual info functions from TransferEntropy.jl and InformationMeasures.jl wrong? Is there a different package I should be using? It seems like A(x,y) is a better tool than Pearson correlation whenever there’s genuine non-linearity and a decent S/N ratio so it would be nice to have a working Julia implementation…

3 Likes

For starters, this must be positive, and the correlation can be negative. So perhaps this would approximate abs(correlation) or correlation^2.

1 Like

Yes, it’s abs(correlation). It’s a scaled version of mutual information that lies in [0,1].

The method from InformationMeasures seems to be quite sensitive to the number of bins and TransferEntropy did not install for me. CausalityTools provides matching estimates though. Note that mutual information has units, e.g., bits, nats etc, and most estimators return bits by default whereas the formula from the article seems to assume nats!

julia> using CausalityTools

julia> n = 10_0000;

julia> rho = .3;

julia> x = randn(n);

julia> z = randn(n);

julia> y = rho * x + sqrt(1-rho^2)*z;

julia> association(KSG1(), x, y)
0.05840275706881588

# Factor log(2) converting to nats
julia> sqrt(1 - exp(- 2 * ans * log(2)))
0.27887733405855003

# Gaussian estimator should be more precise in this case
julia> association(GaussianMI(), x, y)
0.06971219063660127

julia> sqrt(1 - exp(- 2 * ans * log(2)))
0.30351059740004693
1 Like

Digging into the R functions, a rough translate into Julia of one method is:

using SpecialFunctions, Statistics

function mutinfo(X,Y; k = 10)
    n = length(X)
    length(X) == n || error("X and Y inputs must have same length")
    k < n || error("k must be smaller than length of data")
    nx, ny, dx, dy, r = ntuple(i->similar(X), 5)
    for i in 1:n
        dx .= abs.(X[i] .- X)
        dy .= abs.(Y[i] .- Y)
        r .= ifelse.(dx .> dy, dx, dy)
        di = partialsort!(r,k+1)
        nx[i] = count(<(di), dx)
        ny[i] = count(<(di), dy)
    end
    mi = digamma(n) + digamma(k) - mean(digamma.(nx) .+ digamma.(ny))
    return mi
end

With this mutinfo I’ve managed to recapitulate the abs(corr) relationship. This function is far from good Julia style right now. But it works. Part of the issue is using nearest neighbours instead of bins to calculate entropy, and then the mean-log-distance is an approximation for the entropy (as discussed in the references you gave).

ADDED: Here is a little test run of the function above:

function test_info(;rho = 0.3, n = 10_000)
    x = randn(n); 
    z = randn(n); 
    y = rho * x + sqrt(1-rho^2)*z;
    C, M = cor(x,y), max(0,mutinfo(x,y))
    return C, sqrt(1 - exp(-2 * M))
end

using UnicodePlots
rho_rng = -0.4:0.2:0.9
lineplot(rho_rng, stack(test_info(;rho) for rho in rho_rng; dims=1))

producing:

             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   
    0.791107 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘‡β €β €β €β €β €β €β €β €β €β €β €β €β €β’ β Šβ €β €β €β €β”‚ y1
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⑇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⑔⠁⠀⠀⠀⠀⠀│ y2
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘‡β €β €β €β €β €β €β €β €β €β €β’€β Žβ €β €β €β €β €β €β €β”‚   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⑇⠀⠀⠀⠀⠀⠀⠀⠀⠀①⠃⠀⠀⠀⠀⠀⠀⠀⠀│   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘‡β €β €β €β €β €β €β €β’€β œβ €β €β €β €β €β €β €β €β €β €β”‚   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⒄⠀⠀⠀⠀⠀⠀⠀⑇⠀⠀⠀⠀⠀⒀⑴⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β ‰β ’β‘€β €β €β €β €β‘‡β €β €β €β €β‘ β Šβ €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ˆβ’£β €β €β €β‘‡β €β €β£°β Žβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ±β‘€β €β‘‡β’€β œβ β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ˜β’„β£·β ‹β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β”‚β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β’²β šβ‘—β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β ’β”‚   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⑰⠁⠀⑇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘œβ €β €β €β‘‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘ β Šβ €β €β €β €β‘‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
   -0.361421 β”‚β €β €β €β €β €β €β €β €β €β €β €β €β‘ β Šβ €β €β €β €β €β €β‘‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚   
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   
             β €-1β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €1β €   
2 Likes

Thank you, @Dan and @bertschi. Very helpful! Both responses solve my original problem.

Btw @bertschi, I had already converted to nats using the base argument TransferEntropy.mutualinfo but I didn’t mention it because answers were still way off.

It’s interesting that Associations.association (apparently we should all upgrade from CausalityTools to Associations) is so much faster than @Dan’s implementation.