Question about mutual information calculation

adannenberg · January 17, 2025, 8:09pm

I came across this article which points out that
A(x,y) = sqrt(1 - exp(-2 * MI(x,y)))
(where MI(x,y) is the mutual information shared by data samples x and y) is a very nice measure of association between x and y.

One of the nice features that A(x,y) has is that A(x,y) = pearson correlation(x,y) when x and y are truly linearly related, e.g. if

n = 10_0000; 
rho = .3; 
x = randn(n); 
z = randn(n); 
y = rho * x + sqrt(1-rho^2)*z;

then, supposedly A(x,y) is approximately equal to rho.

The article uses R and the package FNN which bins the data using the method described in A. Kraskov, H. Stogbauer and P.Grassberger
(2004). Unfortunately the source is a mix of C and R and looks pretty impenetrable at first glance.

I tried to implement A(x,y) in Julia and my implementation failed its very first sanity check: I failed to reproduce the A(x,y) = rho feature for data that’s truly linearly correlated.

I first tried computing MI(x,y) with TransferEntropy.mutualinfo(x,y,Kraskov1(2)) and variations (different cluster size inputs to Kraskov1 and Kraskov2, etc.). I thought this should be the same method and result as the R package, but it doesn’t seem to be close.

I’ve also tried InformationMeasures.get_mutual_information(x,y) but it also fails to reproduce the result.

Can anyone provide guidance here? Am I calling the mutual info functions from TransferEntropy.jl and InformationMeasures.jl wrong? Is there a different package I should be using? It seems like A(x,y) is a better tool than Pearson correlation whenever there’s genuine non-linearity and a decent S/N ratio so it would be nice to have a working Julia implementation…

Dan · January 17, 2025, 10:21pm

For starters, this must be positive, and the correlation can be negative. So perhaps this would approximate abs(correlation) or correlation^2.

adannenberg · January 17, 2025, 10:55pm

Yes, it’s abs(correlation). It’s a scaled version of mutual information that lies in [0,1].

bertschi · January 17, 2025, 11:30pm

The method from InformationMeasures seems to be quite sensitive to the number of bins and TransferEntropy did not install for me. CausalityTools provides matching estimates though. Note that mutual information has units, e.g., bits, nats etc, and most estimators return bits by default whereas the formula from the article seems to assume nats!

julia> using CausalityTools

julia> n = 10_0000;

julia> rho = .3;

julia> x = randn(n);

julia> z = randn(n);

julia> y = rho * x + sqrt(1-rho^2)*z;

julia> association(KSG1(), x, y)
0.05840275706881588

# Factor log(2) converting to nats
julia> sqrt(1 - exp(- 2 * ans * log(2)))
0.27887733405855003

# Gaussian estimator should be more precise in this case
julia> association(GaussianMI(), x, y)
0.06971219063660127

julia> sqrt(1 - exp(- 2 * ans * log(2)))
0.30351059740004693

Dan · January 18, 2025, 12:06am

Digging into the R functions, a rough translate into Julia of one method is:

using SpecialFunctions, Statistics

function mutinfo(X,Y; k = 10)
    n = length(X)
    length(X) == n || error("X and Y inputs must have same length")
    k < n || error("k must be smaller than length of data")
    nx, ny, dx, dy, r = ntuple(i->similar(X), 5)
    for i in 1:n
        dx .= abs.(X[i] .- X)
        dy .= abs.(Y[i] .- Y)
        r .= ifelse.(dx .> dy, dx, dy)
        di = partialsort!(r,k+1)
        nx[i] = count(<(di), dx)
        ny[i] = count(<(di), dy)
    end
    mi = digamma(n) + digamma(k) - mean(digamma.(nx) .+ digamma.(ny))
    return mi
end

With this mutinfo I’ve managed to recapitulate the abs(corr) relationship. This function is far from good Julia style right now. But it works. Part of the issue is using nearest neighbours instead of bins to calculate entropy, and then the mean-log-distance is an approximation for the entropy (as discussed in the references you gave).

ADDED: Here is a little test run of the function above:

function test_info(;rho = 0.3, n = 10_000)
    x = randn(n); 
    z = randn(n); 
    y = rho * x + sqrt(1-rho^2)*z;
    C, M = cor(x,y), max(0,mutinfo(x,y))
    return C, sqrt(1 - exp(-2 * M))
end

using UnicodePlots
rho_rng = -0.4:0.2:0.9
lineplot(rho_rng, stack(test_info(;rho) for rho in rho_rng; dims=1))

producing:

             ┌────────────────────────────────────────┐   
    0.791107 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢠⠊⠀⠀⠀⠀│ y1
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡔⠁⠀⠀⠀⠀⠀│ y2
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠎⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠃⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⢀⠜⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢄⠀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⢀⡴⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠉⠢⡀⠀⠀⠀⠀⡇⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⢣⠀⠀⠀⡇⠀⠀⣰⠎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⡇⢀⠜⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢄⣷⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⢲⠚⡗⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡰⠁⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡜⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
   -0.361421 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡠⠊⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│   
             └────────────────────────────────────────┘   
             ⠀-1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀1⠀

adannenberg · January 18, 2025, 3:08pm

Thank you, @Dan and @bertschi. Very helpful! Both responses solve my original problem.

Btw @bertschi, I had already converted to nats using the base argument TransferEntropy.mutualinfo but I didn’t mention it because answers were still way off.

It’s interesting that Associations.association (apparently we should all upgrade from CausalityTools to Associations) is so much faster than @Dan’s implementation.

Topic		Replies	Views
How can happen that mutual information is negative? Statistics question	1	1659	May 14, 2022
Cross-correlation function in Julia vs. Python Statistics question , statistics	12	8618	March 21, 2018
Distance Correlation in Julia General Usage package , statistics	8	1225	May 1, 2025
Pvalue for Spearman Correlation in julia New to Julia question , hypothesis-tests	3	274	February 13, 2025
Cross correlation with StatsBase.crosscor answer does not make sense Statistics statistics	1	867	August 6, 2020

Question about mutual information calculation

Related topics