# (Blog Post) Optimizing Repeated Correlations

I wrote a short post on optimizing repeated correlations using Math (and Julia): https://www.lesswrong.com/posts/AESkD3gafBXx6pm77/optimizing-repeated-correlations

4 Likes

using Statistics
using BenchmarkTools

n = 1000
a = rand(n);
b = rand(n);
c = rand(n);
xs = [rand(n) for i in 1:10_000];

function raw_correlations(a, b, c, xs)
return [cor(x, y) for y in [a, b, c] for x in xs]
end

display(@benchmark raw_correlations(\$a, \$b, \$c, \$xs))

[cor(x, y) for y in [a,b,c]]
end
end

This gets me the base timing:

BenchmarkTools.Trial: 284 samples with 1 evaluation.
Range (min β¦ max):  17.108 ms β¦  21.202 ms  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     17.510 ms               β GC (median):    0.00%
Time  (mean Β± Ο):   17.643 ms Β± 536.038 ΞΌs  β GC (mean Β± Ο):  0.00% Β± 0.00%

βββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
17.1 ms         Histogram: frequency by time           20 ms <

Memory estimate: 812.81 KiB, allocs estimate: 12.

BenchmarkTools.Trial: 1552 samples with 1 evaluation.
Range (min β¦ max):  1.876 ms β¦ 88.282 ms  β GC (min β¦ max): 0.00% β¦ 96.89%
Time  (median):     2.614 ms              β GC (median):    0.00%
Time  (mean Β± Ο):   3.208 ms Β±  3.527 ms  β GC (mean Β± Ο):  6.73% Β±  6.68%

βββββββ            β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
1.88 ms      Histogram: log(frequency) by time     13.3 ms <

Memory estimate: 2.32 MiB, allocs estimate: 20497.

I suspect thereβs also a clever linear algebra trick here as well, but my dumb use of cor alone ends up being relatively costly.

Linear code:

A = [a b c]
X = reduce(hcat, xs)
linalg_cor = vec(cor(A, X)')

The times for all three methods are

• Original: 16ms
• cor: 40ms

Obviously, you could probably compose these with your z-score method as well for additional improvements.

1 Like

I donβt think your codes compute the same things.

This needs to be:

julia> correlations2 = [zscores(x)'*y for y in [za, zb, zc] for x in xs];
julia> sum(abs2, correlations .- correlations2)
8.499807095096476e-30

But then itβs actually slower.

julia> @btime correlations2 = [zscores(x)'*y for y in [\$za, \$zb, \$zc] for x in \$xs];
76.307 ms (60012 allocations: 465.88 MiB)
julia> @btime correlations = [cor(x, y) for y in [\$a, \$b, \$c] for x in \$xs];
19.886 ms (12 allocations: 812.81 KiB)

We could use a non-allocating zscores! but it still is slower:

function zscores!(x)
ΞΌ = mean(x)
Ο = std(x; mean=ΞΌ)
@. x = (x - ΞΌ)/Ο
x
end
julia> @btime correlations2 = [zscores!(x)'*y for y in [\$za, \$zb, \$zc] for x in \$xs];
29.185 ms (12 allocations: 812.81 KiB)
3 Likes

Thanks for the correction!

Note that your version is calculating zscores(x) 3 times per loop. If I use a version that calculates it once, and the non-allocating zscoresβ¦itβs still slightly slower.

a_length = 10_000
a = rand(a_length)
b = rand(a_length)
c = rand(a_length)
xs = [rand(a_length) for i in 1:10_000]

function get_correlations1(xs, a, b, c)
return [[cor(x, y) for y in [a, b, c]] for x in xs]
end

@btime correlations = get_correlations1(\$xs, \$a, \$b, \$c)
382.133 ms (20002 allocations: 1.60 MiB)

function get_correlations3(xs, a, b, c)
la = length(a) - 1
za, zb, zc = zscores!.([a, b, c]) ./ la
output = Vector{Float64}[]
for x in xs
zx = zscores!(x)
push!(output, [zx' * y for y in [za, zb, zc]])
end
return output
end

@btime correlations3 = get_correlations3(\$xs, \$a, \$b, \$c)
425.309 ms (60026 allocations: 766.16 MiB)

Iβm very confused, because I have a production use case where the zscores version is much, much faster, that I just tested again. So I need to figure out what the difference is. In the meantime, Iβve taken down the post.

Iβve now fixed this bug, and the current version of my code runs ~33% faster than the original version β much less exciting. Still not sure why Iβm getting a much larger effect in production β my best guess is that thereβs something efficient about the dot product on hundreds of millions of rows that doesnβt really show up on smaller datasets.

1 Like

<shameless_self_promotion>

If you read this thread, you might be interested in the package FastLocalCorrelationCoefficients.jl.

Notable characteristics:

• supports any number of dimensions (shapes)
• uses Fast Fourier transform for high-performance
• allows precomputation of common calculations to locate multiple βneedlesβ of the same size/shape in a βhaystackβ
• utilizes multithreading and distributed processing.

Help us make FastLocalCorrelationCoefficients.jl better.

2 Likes