I wonder the most efficient way to calculate the pairwise correlation between rows of a Matrix{Float64}
with a large number of rows and columns. I know “large” is subjective. Suppose the large is (5000, 50000)
(meaning 5000
rows, and 50000
columns). I asked GitHub Copilot chat for a suggestion about a high-performance code in this regard, and I got the following snippet as the suggestion:
using LinearAlgebra
function pairwise_correlations(A::Matrix{Float64})
n = size(A, 1)
C = similar(A, n, n)
for i in 1:n
BLAS.ger!(1.0, A[i, :], A[i, :], C)
end
BLAS.syrk!('L', 'T', 1.0, C, 0.0, C)
for i in 1:n
C[i, i] = 1.0
end
return C
end
And the explanation:
Note that calculating pairwise correlations between a large number of rows can be computationally expensive, so it’s important to consider the performance implications of this operation. The
BLAS
functions used in this example code snippet are highly optimized and can provide significant performance improvements over other methods, but they may not be the most efficient solution for all use cases.
Surprisingly, the code is super slow that I couldn’t measure it using @benchmark
. Still, the following snippet works better:
julia> a = rand(5000, 50000);
julia> using Statistics
julia> @benchmark cor($a, dims=2)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 23.153 s (0.00% GC) to evaluate,
with a memory estimate of 2.05 GiB, over 18 allocations.
I wonder how this can be improved.
Another suggested snippet is:
using Distributed
using DistributedArrays
# Start a local cluster with 4 worker processes
addprocs(4)
@everywhere using LinearAlgebra
function pairwise_correlations(A::Matrix{Float64})
n = size(A, 1)
C = drandn(n, n)
for i in 1:n
BLAS.ger!(1.0, A[i, :], A[i, :], C)
end
C = convert(Matrix{Float64}, C)
C = DistributedArray(C)
C = C * C'
for i in 1:n
C[i, i] = 1.0
end
return C
end
with the following explanation:
This code starts a local cluster with 4 worker processes using the
addprocs
function, and then uses the@everywhere
macro to load theLinearAlgebra
package on all worker processes. Thepairwise_correlations
function then creates a distributed arrayC
using thedrandn
function, and usesBLAS.ger!
to calculate the outer product of each row. The resulting distributed arrayC
is then converted to a regularMatrix{Float64}
and multiplied by its transpose using the*
operator. Finally, the diagonal elements ofC
are set to 1.0 to ensure that the correlation between a row and itself is 1.0.
Note that this implementation uses a large amount of memory due to the creation of the distributed array
C
. If memory usage is a concern, you may want to consider using a more memory-efficient algorithm or optimizing the code further.
That throws an error:
julia> a = rand(20, 30);
julia> pairwise_correlations(a)
ERROR: TaskFailedException
nested task error: On worker 2:
KeyError: key DistributedArrays [aaf54ef3-cdf8-58ed-94cc-d582ad619b94] not found
P.s.:
It’s not mandatory to be between pairs of rows. It can be between pairs of columns!