Hi, sorry this is kind of a beginner/performance question. I want to generate a matrix of all of the pairwise differences among the rows of a matrix. The code below works, but gets slow as matrices get large (e.g., 1000 x 200 )
c = [1 2 3 ; 4 5 6 ; 7 8 9 ; 10 11 12];
cdist = [0 0 0];
for j = 1:3:
for h = (j+1):4
global cdist
cdist = vcat(cdist, abs.(c[j, :] - c[h, :])')
end
end
cdist = cdist[2:(size(cdist)[1]), :];
I read in the performance tips that it’s preferable to avoid using global variables, but I don’t understand how to output the matrix at the end without doing this…
Thanks in advance!
Welcome! Since I saw you edited your post a couple of times to try to fix the formatting, you might want to take a look at PSA: how to quote code with backticks which explains how to quote your code properly.
To answer your actual question, there are a lot of things we can do to improve this code. The first one is to, as you mentioned, avoid using global variables. This is easy if you put your code into a function, which is always a good idea:
julia> function distance_matrix(c)
cdist = [0 0 0]
for j = 1:3
for h = (j+1):4
cdist = vcat(cdist, abs.(c[j, :] - c[h, :])')
end
end
cdist = cdist[2:(size(cdist)[1]), :];
return cdist
end
distance_matrix (generic function with 1 method)
julia> c = [1 2 3 ; 4 5 6 ; 7 8 9 ; 10 11 12];
julia> distance_matrix(c)
6×3 Array{Int64,2}:
3 3 3
6 6 6
9 9 9
3 3 3
6 6 6
3 3 3
Before we work on performance of the code, the first thing to do is to make sure it gives the correct output. Is this what you expect to get?
Ok, then we should measure the performance so we know if we’re having an effect:
julia> using Pkg
julia> Pkg.add("BenchmarkTools")
julia> using BenchmarkTools
julia> @btime distance_matrix($c)
1.134 μs (44 allocations: 4.36 KiB)
1 microsecond is indeed pretty slow for an operation on such a small matrix. There are several things in this code that are likely to be slow:
It creates a lot of intermediate vectors and copies: When you do c[j, :], you create a copy of that row of the matrix, which is wasteful in this case. You can do @view(c[j, :]) to avoid that copy. That won’t make a big difference when each row is three elements, but it’s important for larger slices.
It repeatedly calls vcat to build up a matrix, which is going to be pretty slow because vcat-int a matrix requires allocating a whole new matrix to hold the result, copying the data over, and then adding the new row.
One easy way to get really good performance in this case is to pre-allocate the entire result matrix and then write out the loop:
# Returns the nth triangle number, i.e. the sum of
# 1 + 2 + ... + n
#
# (I learned this algorithm from a wonderful book called "The Number Devil"
# when I was a kid).
function triangle_number(n)
if iseven(n)
(n ÷ 2) * (n + 1)
else
((n + 1) ÷ 2) * n
end
end
function distance_matrix2(c)
# Allocate a matrix to hold the result. The number of rows
# depends on the `n - 1`th triangle number, where n is the number
# of rows of `c`. Hooray Math!
cdist = zeros(eltype(c), triangle_number(size(c, 1) - 1), size(c, 2))
row = 1
for j in 1:(size(c, 1) - 1)
for h in (j + 1):size(c, 1)
for k in 1:size(c, 2)
cdist[row, k] = abs(c[j, k] - c[h, k])
end
row += 1
end
end
return cdist
end
Of course we can’t just stop here, because it wouldn’t be Julia if we didn’t try to make it perfect. The next issue I see with the code is that the inner-most loop is over each row of the matrix c. Iterating over the rows of a matrix in Julia is, in general, slower than iterating over the columns. That’s because matrices in Julia are stored in column-major layout, so each column is contiguous in memory. Note that this is similar to Matlab and Fortran but different than Numpy and C for boring historical reasons.
We can make the operations a bit faster, especially on large matrices, if you’re willing to decide to operate over columns instead of rows:
function distance_matrix3(c)
cdist = zeros(eltype(c), size(c, 1), triangle_number(size(c, 2) - 1))
col = 1
for j in 1:(size(c, 2) - 1)
for h in (j + 1):size(c, 2)
for k in 1:size(c, 1)
cdist[k, col] = abs(c[k, j] - c[k, h])
end
col += 1
end
end
return cdist
end
Next, you might notice that we’re creating cdist as a matrix of zeros and then setting every element. That means we’re wasting time filling the initial array with zeros. We can avoid that by constructing a matrix with undef, which tells Julia to just allocate memory but not fill it with anything:
function distance_matrix4(c)
cdist = Matrix{eltype(c)}(undef, size(c, 1), triangle_number(size(c, 2) - 1))
col = 1
for j in 1:(size(c, 2) - 1)
for h in (j + 1):size(c, 2)
for k in 1:size(c, 1)
cdist[k, col] = abs(c[k, j] - c[k, h])
end
col += 1
end
end
return cdist
end
Very slightly faster, but the return on effort is diminishing at this point.
There’s more you can do, like using @inbounds to avoid bounds checks, using @simd to vectorize your innermost loop, and using StaticArrays.jl to represent your data as a collection of small vectors rather than a matrix. I won’t go into detail on that here because it’s been covered in a lot of other posts on this forum, but you can dig as deep as you’re interested.
Overall, though, I think the message is that if you:
Pre-allocate your outputs when possible
Avoid accidentally creating lots of copies
Don’t be afraid to write out a loop
then you can write Julia which is pretty close to optimal in terms of performance without going too far down the rabbit hole.
This final code brought things from not finishing after > 1hr to finishing in about a second for a 132 x 533 matrix (on a laptop from 2012). I’m amazed.