Efficient way of computing norm between all the rows of a matrix

vinicius_de_lima · October 24, 2021, 12:10am

I stored points in the rows of a 999900 x 2 matrix and tried to count the number of pairs of these points that are at a distance smaller than r, but my implementation was too slow, there is a more efficient way to achieve this? Below is my attempt:

function C(pontos::T,r::Real) where T <: AbstractMatrix{<:Real}
    contar = 0
    for r1 in 1:size(pontos,1)
        for r2 in r1+1:size(pontos,1)
            if norm(pontos[r1,:] - pontos[r2,:]) < r
                contar += 1 
            end
        end
    end
    return contar
end

stevengj · October 24, 2021, 1:06am

Try changing your data structure to a 1d array of 2-component SVectors (from StaticArrays.jl). That is a vastly more efficient way to do computations with small fixed-length vectors (e.g. coordinates in 2d).

For example, it will eliminate all the heap allocations you are doing in your inner loops, and it will inline all of the 2-component vector operations (unrolling the 2-iteration loops).

Oscar_Smith · October 24, 2021, 2:06am

also, computing norm squared is generally faster since it doesn’t require a square root.

lmiq · October 24, 2021, 2:50am

If r is significantly smaller than the average distance of the points, you can do that much, much faster using cell lists. For instance using CellListMap.jl. if you can provide an actual example and need some help, let me know.

vinicius_de_lima · October 24, 2021, 4:49am

Thank you, I believe that in fact r will attend this condition, in my use case it will assume values in the range 2^-8 to 2^-2 and the points are from a Henon map, here is the matrix I used.

lmiq · October 24, 2021, 12:28pm

This kind of data is not the one where cell lists shine the most, because it is very sparse (cell lists are ideal for more or less homogeneous distributions of points). Still, there it goes (I have adapted the naive version to use static arrays):

julia> include("./vinicius.jl")
cell list - serial: 
  5.258 s (13576 allocations: 68.23 MiB)
cell list - parallel: 
  2.096 s (77501 allocations: 131.05 MiB)
naive - serial: 
# still running, will update when finished...

Here is the code:

using CellListMap
using StaticArrays
using DelimitedFiles
using LinearAlgebra: norm
using BenchmarkTools

function count_close(p::AbstractVector{SVector{N,T}},r;parallel=true) where {N,T}
    p_length = limits(p) # length of set on each dimension
    # To prevent too many cells, set the cell side such that the maximum number of cells 
    # is at most equal to the number of points
    max_length = maximum(p_length.limits) 
    max_bins = max_length/r
    n = ceil(Int,min(max_bins,length(p)^(1/N))) 
    cell_side = max(r,max_length/n)
    # Prepare cell lists
    box = Box(p_length,cell_side)
    cl = CellList(p,box,parallel=parallel)
    r2 = r^2
    n_close = 0
    n_close = map_pairwise!(
        (x,y,i,j,d2,n_close) -> d2 < r2 ? n_close + 1 : n_close,
        n_close,
        box, cl, parallel=parallel
    )
    return n_close
end

data = readdlm("pontos.txt")
pontos = [ SVector{2,Float64}(data[i,1],data[i,2]) for i in axes(data,1) ]
pontos = pontos[1:100000]
r = 2e-2

println("cell list - serial: ")
@btime count_close($pontos,$r,parallel=false)

println("cell list - parallel: ")
@btime count_close($pontos,$r,parallel=true)

function C(pontos,r)
    contar = 0
    for r1 in 1:length(pontos)
        for r2 in r1+1:length(pontos)
            if norm(pontos[r2]-pontos[r1]) < r
                contar += 1 
            end
        end
    end
    return contar
end

println("naive - serial: ")
@btime C(pontos,r)

up: For 100_000 points this is the difference (for one million I think it will never end, had to kill it):

julia> include("./vinicius.jl")
cell list - serial: 
  74.476 ms (8778 allocations: 7.61 MiB)
cell list - parallel: 
  27.496 ms (40732 allocations: 17.19 MiB)
naive - serial: 
  8.912 s (1 allocation: 16 bytes)
15579644

Topic		Replies	Views
"vecnorm" : column/row-wise norm New to Julia	5	9127	May 21, 2021
Computing L1 norm between vectors: julian way Performance	14	1810	April 6, 2021
Efficient way to calculate the trace distance New to Julia linearalgebra	1	209	March 31, 2024
Column/row-wise norm New to Julia question	1	1095	December 9, 2020
The most efficient way to calculate the pairwise correlation between rows of a large Matrix{Float64} Performance performance	10	759	May 25, 2023

Efficient way of computing norm between all the rows of a matrix

Related topics